Encoder-dominated ASR models using text-only data via modality matching and downsampling achieve comparable performance to larger-decoder models on LibriSpeech, with simple random duration approaches proving effective.
E-Paraformer: A faster and better parallel transformer for non-autoregressive end-to-end Mandarin speech recognition,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Text-Utilization for Encoder-dominated Speech Recognition Models
Encoder-dominated ASR models using text-only data via modality matching and downsampling achieve comparable performance to larger-decoder models on LibriSpeech, with simple random duration approaches proving effective.