PlanAudio introduces a unified autoregressive LLM framework with semantic latent chain-of-thought for generating composite speech and sound audio from free-form text, plus a new benchmark.
Ov-instructtts: Towards open-vocabulary instruct text-to-speech,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Unified Synthesis of Compositional Speech and Sound from Free-Form Text Prompts
PlanAudio introduces a unified autoregressive LLM framework with semantic latent chain-of-thought for generating composite speech and sound audio from free-form text, plus a new benchmark.