Diffusion language models and a CTC-USDM joint decoder improve ASR accuracy over standard approaches.
Reproducing and dissecting denoising language models for speech recognition
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Encoder-dominated ASR models using text-only data via modality matching and downsampling achieve comparable performance to larger-decoder models on LibriSpeech, with simple random duration approaches proving effective.
citing papers explorer
-
Diffusion Language Models for Speech Recognition
Diffusion language models and a CTC-USDM joint decoder improve ASR accuracy over standard approaches.
-
Text-Utilization for Encoder-dominated Speech Recognition Models
Encoder-dominated ASR models using text-only data via modality matching and downsampling achieve comparable performance to larger-decoder models on LibriSpeech, with simple random duration approaches proving effective.