pith. machine review for the scientific record. sign in

arxiv: 1710.09026 · v2 · submitted 2017-10-25 · 💻 cs.LG · cs.CL· eess.AS· stat.ML

Recognition: unknown

Trace norm regularization and faster inference for embedded speech recognition RNNs

Authors on Pith no claims yet
classification 💻 cs.LG cs.CLeess.ASstat.ML
keywords embeddedlargetrainingconnectedfasterfullyinferencekernels
0
0 comments X
read the original abstract

We propose and evaluate new techniques for compressing and speeding up dense matrix multiplications as found in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, we introduce and study a trace norm regularization technique for training low rank factored versions of matrix multiplications. Compared to standard low rank training, we show that our method leads to good accuracy versus number of parameter trade-offs and can be used to speed up training of large models. For speedup, we enable faster inference on ARM processors through new open sourced kernels optimized for small batch sizes, resulting in 3x to 7x speed ups over the widely used gemmlowp library. Beyond LVCSR, we expect our techniques and kernels to be more generally applicable to embedded neural networks with large fully connected or recurrent layers.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.