A neural attention model for speech command recognition

Christoph Bernkopf; Douglas Coimbra de Andrade; Martin Loesener Da Silva Viana; Sabato Leo

arxiv: 1808.08929 · v1 · pith:UHEEB4IZnew · submitted 2018-08-27 · 📡 eess.AS · cs.SD

A neural attention model for speech command recognition

Douglas Coimbra de Andrade , Sabato Leo , Martin Loesener Da Silva Viana , Christoph Bernkopf This is my paper

classification 📡 eess.AS cs.SD

keywords recognitionattentioncommandsspeechperformancecommandconvolutionalmodel

0 comments

read the original abstract

This paper introduces a convolutional recurrent network with attention for speech command recognition. Attention models are powerful tools to improve performance on natural language, image captioning and speech tasks. The proposed model establishes a new state-of-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keeping a small footprint of only 202K trainable parameters. Results are compared with previous convolutional implementations on 5 different tasks (20 commands recognition (V1 and V2), 12 commands recognition (V1), 35 word recognition (V1) and left-right (V1)). We show detailed performance results and demonstrate that the proposed attention mechanism not only improves performance but also allows inspecting what regions of the audio were taken into consideration by the network when outputting a given category.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multi-layer Attention Mechanism for Speech Keyword Recognition
cs.LG 2019-07 unverdicted novelty 4.0

Introduces multi-layer attention for keyword spotting that incorporates pre-extraction layer information to reduce bias in LSTM attention weights, reporting favorable results versus CNN and bi-LSTM baselines on Google...