EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional Speech Synthesis

· 2026 · cs.CL · arXiv 2606.20650

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Instruction-based controllable speech synthesis enables users to specify emotions through natural language. However, existing approaches often rely on coarse emotion labels and lack explicit modeling of fine-grained intensity. We propose EmoInstruct-TTS, a dual-path instruction-guided framework for emotional speech synthesis. We introduce Emotion2embed, a supervised semantic-acoustic emotion embedding covering 48 emotional states, including fine-grained categories and intensity levels. To infer embeddings from free-form instructions, we design an Instruction-Conditioned Emotion Flow Model (ICE-Flow) that generates acoustically grounded emotion representations. The inferred embeddings are integrated into an LLM-based synthesis pipeline to provide explicit emotional control while preserving semantic planning. Experiments show improved emotional controllability and speech naturalness over strong baselines.

representative citing papers

EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional Speech Synthesis

cs.CL · 2026-06-08 · unverdicted · novelty 6.0

EmoInstruct-TTS uses Emotion2embed and an Instruction-Conditioned Emotion Flow Model (ICE-Flow) to generate acoustically grounded emotion representations from free-form instructions and integrate them into an LLM-based TTS pipeline.

citing papers explorer

Showing 1 of 1 citing paper.

EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional Speech Synthesis cs.CL · 2026-06-08 · unverdicted · none · ref 1 · internal anchor
EmoInstruct-TTS uses Emotion2embed and an Instruction-Conditioned Emotion Flow Model (ICE-Flow) to generate acoustically grounded emotion representations from free-form instructions and integrate them into an LLM-based TTS pipeline.

EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional Speech Synthesis

fields

years

verdicts

representative citing papers

citing papers explorer