Exploring End-to-End Techniques for Low-Resource Speech Recognition

Alexander Zatvornitskiy; Ivan Medennikov; Maxim Korenevsky; Vladimir Bataev

arxiv: 1807.00868 · v1 · pith:6DKP56MYnew · submitted 2018-07-02 · 💻 cs.SD · cs.CL· eess.AS

Exploring End-to-End Techniques for Low-Resource Speech Recognition

Vladimir Bataev , Maxim Korenevsky , Ivan Medennikov , Alexander Zatvornitskiy This is my paper

classification 💻 cs.SD cs.CLeess.AS

keywords speechbestdatadifferentend-to-endlow-resourcerecognitiontechniques

0 comments

read the original abstract

In this work we present simple grapheme-based system for low-resource speech recognition using Babel data for Turkish spontaneous speech (80 hours). We have investigated different neural network architectures performance, including fully-convolutional, recurrent and ResNet with GRU. Different features and normalization techniques are compared as well. We also proposed CTC-loss modification using segmentation during training, which leads to improvement while decoding with small beam size. Our best model achieved word error rate of 45.8%, which is the best reported result for end-to-end systems using in-domain data for this task, according to our knowledge.

This paper has not been read by Pith yet.

Exploring End-to-End Techniques for Low-Resource Speech Recognition

discussion (0)