pith. sign in

arxiv: 1904.10045 · v1 · pith:EJFIGMOZnew · submitted 2019-03-27 · 📡 eess.AS · cs.NE· cs.SD

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

classification 📡 eess.AS cs.NEcs.SD
keywords recognitionmodelspeechctc-basederrorsresultscorrectionlanguage
0
0 comments X
read the original abstract

Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external language model by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced by language model will help to distinguish these substitution errors. In this work, we propose a transformer based spelling correction model to automatically correct errors especially the substitution errors made by CTC-based Mandarin speech recognition system. Specifically, we investigate using the recognition results generated by CTC-based systems as input and the ground-truth transcriptions as output to train a transformer with encoder-decoder architecture, which is much similar to machine translation. Results in a 20,000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3.41%, which results in 22.9% and 53.2% relative improvement compared to the baseline CTC-based systems decoded with and without language model respectively.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.