pith. sign in

arxiv: 1902.01951 · v2 · pith:P722CFBFnew · submitted 2019-02-02 · 📡 eess.AS · cs.CL· cs.LG· cs.SD

Using multi-task learning to improve the performance of acoustic-to-word and conventional hybrid models

classification 📡 eess.AS cs.CLcs.LGcs.SD
keywords modelacoustic-to-wordhybridmodelsmulti-tasktrainingacousticapproach
0
0 comments X
read the original abstract

Acoustic-to-word (A2W) models that allow direct mapping from acoustic signals to word sequences are an appealing approach to end-to-end automatic speech recognition due to their simplicity. However, prior works have shown that modelling A2W typically encounters issues of data sparsity that prevent training such a model directly. So far, pre-training initialization is the only approach proposed to deal with this issue. In this work, we propose to build a shared neural network and optimize A2W and conventional hybrid models in a multi-task manner. Our results show that training an A2W model is much more stable with our multi-task model without pre-training initialization, and results in a significant improvement compared to a baseline model. Experiments also reveal that the performance of a hybrid acoustic model can be further improved when jointly training with a sequence-level optimization criterion such as acoustic-to-word.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.