ELF-S2T applies audio-conditioned flow-matching on continuous text latents from pre-trained ELF to achieve competitive ASR and S2TT results, with analysis showing shared close-distance confusion in latent space.
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spo- ken Language Understanding in SpeechLLMs
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
HybridCodec combines discrete tokens with continuous residuals via a focal modulation codec and hybrid Transformer to improve speaker retention and reduce autoregressive steps in speech language models.
Introduces XLSR-Thai encoder, U-Align alignment, and Thai-SUP data pipeline to enable multitask speech understanding SLLMs for Thai.
citing papers explorer
-
Speech Meets ELF: Audio Conditional Continuous-Target Diffusion for Speech Recognition and Translation
ELF-S2T applies audio-conditioned flow-matching on continuous text latents from pre-trained ELF to achieve competitive ASR and S2TT results, with analysis showing shared close-distance confusion in latent space.
-
Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages
Introduces XLSR-Thai encoder, U-Align alignment, and Thai-SUP data pipeline to enable multitask speech understanding SLLMs for Thai.