HybridCodec combines discrete tokens with continuous residuals via a focal modulation codec and hybrid Transformer to improve speaker retention and reduce autoregressive steps in speech language models.
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spo- ken Language Understanding in SpeechLLMs
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Introduces XLSR-Thai encoder, U-Align alignment, and Thai-SUP data pipeline to enable multitask speech understanding SLLMs for Thai.
citing papers explorer
-
HybridCodec: Modeling Discrete and Continuous Representations for Efficient Speech Language Models
HybridCodec combines discrete tokens with continuous residuals via a focal modulation codec and hybrid Transformer to improve speaker retention and reduce autoregressive steps in speech language models.
-
Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages
Introduces XLSR-Thai encoder, U-Align alignment, and Thai-SUP data pipeline to enable multitask speech understanding SLLMs for Thai.