MELD jointly optimizes a discrete latent variable encoder on mel-spectrograms with an autoregressive speech LM, claiming gains over codec and mel baselines on zero-shot TTS/STT plus fewer autoregressive artifacts.
Continuous speech synthesis using per-token latent diffusion.arXiv preprint arXiv:2410.16048,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
MELD jointly optimizes a discrete latent variable encoder on mel-spectrograms with an autoregressive speech LM, claiming gains over codec and mel baselines on zero-shot TTS/STT plus fewer autoregressive artifacts.