pith. sign in

arxiv: 1906.06207 · v1 · pith:XAJ7NHYInew · submitted 2019-06-14 · 💻 cs.CL · stat.ML

Cumulative Adaptation for BLSTM Acoustic Models

classification 💻 cs.CL stat.ML
keywords adaptationnetworkusedacousticblstmcumulativeenvironmenterror
0
0 comments X
read the original abstract

This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of learning temporal relationships and translation invariant representations, is used for robust acoustic modelling. Further, i-vectors were used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 8\% relative improvement in word error rate on the NIST Hub5 2000 evaluation test set. By enhancing the first-pass i-vector based adaptation with a second-pass adaptation using speaker and environment dependent transformations within the network, a further relative improvement of 5\% in word error rate was achieved. We have reevaluated the features used to estimate i-vectors and their normalization to achieve the best performance in a modern large scale automatic speech recognition system.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.