pith. machine review for the scientific record. sign in

arxiv: 1603.03185 · v2 · submitted 2016-03-10 · 💻 cs.CL · cs.LG· cs.SD

Recognition: unknown

Personalized Speech recognition on mobile devices

Authors on Pith no claims yet
classification 💻 cs.CL cs.LGcs.SD
keywords memoryfootprintmodeldictationfasterinformationlanguagereal-time
0
0 comments X
read the original abstract

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.