pith. sign in

arxiv: 1708.05466 · v1 · pith:AV5566HTnew · submitted 2017-08-17 · 💻 cs.CL

Large-Scale Domain Adaptation via Teacher-Student Learning

classification 💻 cs.CL
keywords modeldatadomainadaptationspeechaccuracyacoustictarget
0
0 comments X
read the original abstract

High accuracy speech recognition requires a large amount of transcribed data for supervised training. In the absence of such data, domain adaptation of a well-trained acoustic model can be performed, but even here, high accuracy usually requires significant labeled data from the target domain. In this work, we propose an approach to domain adaptation that does not require transcriptions but instead uses a corpus of unlabeled parallel data, consisting of pairs of samples from the source domain of the well-trained model and the desired target domain. To perform adaptation, we employ teacher/student (T/S) learning, in which the posterior probabilities generated by the source-domain model can be used in lieu of labels to train the target-domain model. We evaluate the proposed approach in two scenarios, adapting a clean acoustic model to noisy speech and adapting an adults speech acoustic model to children speech. Significant improvements in accuracy are obtained, with reductions in word error rate of up to 44% over the original source model without the need for transcribed data in the target domain. Moreover, we show that increasing the amount of unlabeled data results in additional model robustness, which is particularly beneficial when using simulated training data in the target-domain.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update

    eess.AS 2026-04 unverdicted novelty 7.0

    Simultaneous ensemble teacher update with the student model improves unsupervised domain adaptation for ASR, reducing WER by 4.6% on the Switchboard eval00 set.

  2. Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

    eess.AS 2019-07 unverdicted novelty 4.0

    Knowledge distillation from an external RNN language model to a seq2seq ASR model yields 9.3% CER on Chinese datasets, an 18.42% relative improvement over the baseline without test-time fusion components.