Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Ann Lee; Awni Hannun; Qiantong Xu; Ronan Collobert

arxiv: 1904.02619 · v1 · pith:XWJDO62Onew · submitted 2019-04-04 · 💻 cs.CL

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Awni Hannun , Ann Lee , Qiantong Xu , Ronan Collobert This is my paper

classification 💻 cs.CL

keywords modelefficientseparablesequence-to-sequencetime-deptharchitectureconvolutionconvolutional

0 comments

read the original abstract

We propose a fully convolutional sequence-to-sequence encoder architecture with a simple and efficient decoder. Our model improves WER on LibriSpeech while being an order of magnitude more efficient than a strong RNN baseline. Key to our approach is a time-depth separable convolution block which dramatically reduces the number of parameters in the model while keeping the receptive field large. We also give a stable and efficient beam search inference procedure which allows us to effectively integrate a language model. Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.

This paper has not been read by Pith yet.

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

discussion (0)