pith. sign in

arxiv: 1712.02427 · v1 · pith:AKYORW33new · submitted 2017-12-06 · 💻 cs.LG

High performance ultra-low-precision convolutions on mobile devices

classification 💻 cs.LG
keywords workloadsarmv7deepdevicesimplementationlearningmobileultra-low-precision
0
0 comments X
read the original abstract

Many applications of mobile deep learning, especially real-time computer vision workloads, are constrained by computation power. This is particularly true for workloads running on older consumer phones, where a typical device might be powered by a single- or dual-core ARMv7 CPU. We provide an open-source implementation and a comprehensive analysis of (to our knowledge) the state of the art ultra-low-precision (<4 bit precision) implementation of the core primitives required for modern deep learning workloads on ARMv7 devices, and demonstrate speedups of 4x-20x over our additional state-of-the-art float32 and int8 baselines.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.