Diffusion language models and a CTC-USDM joint decoder improve ASR accuracy over standard approaches.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Encoder-dominated ASR models using text-only data via modality matching and downsampling achieve comparable performance to larger-decoder models on LibriSpeech, with simple random duration approaches proving effective.
Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.
citing papers explorer
-
Diffusion Language Models for Speech Recognition
Diffusion language models and a CTC-USDM joint decoder improve ASR accuracy over standard approaches.
-
Text-Utilization for Encoder-dominated Speech Recognition Models
Encoder-dominated ASR models using text-only data via modality matching and downsampling achieve comparable performance to larger-decoder models on LibriSpeech, with simple random duration approaches proving effective.
-
LLMs and Speech: Integration vs. Combination
Tight integration of acoustic models with LLMs for ASR is ablated against shallow fusion across label units, fine-tuning strategies, LLM sizes, and joint CTC decoding to mitigate hallucinations.