Online Predictive Coding for Dual-Mode Self-Supervised Speech Model

· 2026 · cs.SD · arXiv 2606.21268

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Dual-mode self-supervised speech models are pre-trained to handle streaming and non-streaming conditions simultaneously. However, their attention is computed over different context ranges, which often makes optimization difficult. In previous work, we proposed online registers, additional tokens intended to compensate for missing future context in streaming mode, but the gains remained limited. To address these issues, we introduce two improvements for robust dual-mode pre-training: (1) Online Predictive Coding (OPC), which regularizes the registers through multi-step future prediction, and (2) Dual-mode Layer Normalization, which stabilizes optimization. We fine-tune the proposed dual-mode self-supervised speech models for speech recognition on LibriSpeech and WSJ. Results show that OPC consistently reduces the online-offline performance gap; at 160 ms latency on LibriSpeech, word error rates improve from 3.65% to 3.40% on test-clean and from 10.15% to 9.65% on test-other.

representative citing papers

Online Predictive Coding for Dual-Mode Self-Supervised Speech Model

cs.SD · 2026-06-19 · unverdicted · novelty 3.0

Proposes OPC and dual-mode LN to improve dual-mode SSL speech models, reducing WER gap at 160 ms latency on LibriSpeech from 3.65% to 3.40% (test-clean).

citing papers explorer

Showing 1 of 1 citing paper.

Online Predictive Coding for Dual-Mode Self-Supervised Speech Model cs.SD · 2026-06-19 · unverdicted · none · ref 2 · internal anchor
Proposes OPC and dual-mode LN to improve dual-mode SSL speech models, reducing WER gap at 160 ms latency on LibriSpeech from 3.65% to 3.40% (test-clean).

Online Predictive Coding for Dual-Mode Self-Supervised Speech Model

fields

years

verdicts

representative citing papers

citing papers explorer