Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.
High- fidelity audio compression with improved rvqgan,
2 Pith papers cite this work. Polarity classification is still indexing.
fields
eess.AS 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A timing-aware pre-quantization fusion approach integrates visual cues into audio tokenizers along the temporal axis, maintaining reconstruction quality while outperforming audio-only and prior multimodal baselines on downstream tasks.
citing papers explorer
-
Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation
Chain-of-Details (CoD) is a cascaded TTS method that explicitly models temporal coarse-to-fine dynamics with a shared decoder, achieving competitive performance using significantly fewer parameters.
-
Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization
A timing-aware pre-quantization fusion approach integrates visual cues into audio tokenizers along the temporal axis, maintaining reconstruction quality while outperforming audio-only and prior multimodal baselines on downstream tasks.