Low-memory GEMM-based convolution algorithms for deep neural networks

Andrew Anderson; Aravind Vasudevan; Cormac Keane; David Gregg

arxiv: 1709.03395 · v1 · pith:VMM24HWRnew · submitted 2017-09-08 · 💻 cs.CV

Low-memory GEMM-based convolution algorithms for deep neural networks

Andrew Anderson , Aravind Vasudevan , Cormac Keane , David Gregg This is my paper

classification 💻 cs.CV

keywords algorithmsconvolutionjustspaceadditionalapproacheslow-memoryoperations

0 comments

read the original abstract

Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEMM. Although different approaches all perform the same number of operations, the size of temporary data structures differs significantly. Convolution of an input matrix with dimensions $C \times H \times W$, requires $O(K^2CHW)$ additional space using the classical im2col approach. More recently memory-efficient approaches requiring just $O(KCHW)$ auxiliary space have been proposed. We present two novel GEMM-based algorithms that require just $O(MHW)$ and $O(KW)$ additional space respectively, where $M$ is the number of channels in the result of the convolution. These algorithms dramatically reduce the space overhead of DNN convolution, making it much more suitable for memory-limited embedded systems. Experimental evaluation shows that our low-memory algorithms are just as fast as the best patch-building approaches despite requiring just a fraction of the amount of additional memory. Our low-memory algorithms have excellent data locality which gives them a further edge over patch-building algorithms when multiple cores are used. As a result, our low memory algorithms often outperform the best patch-building algorithms using multiple threads.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Indirect Convolution Algorithm
cs.CV 2019-07 unverdicted novelty 7.0

The Indirect Convolution algorithm avoids im2col by using an indirection buffer, reducing memory overhead proportionally to input channels and outperforming GEMM-based methods by up to 62% for convolutions requiring t...
Mapped Convolutions
cs.CV 2019-06 unverdicted novelty 7.0

Mapped convolutions generalize standard convolutions by decoupling sampling and weighting, enabling direct convolution on spherical and mesh data with a 17% improvement in spherical depth estimation.