pith. sign in

arxiv: 1803.03790 · v1 · pith:GI4QKTSOnew · submitted 2018-03-10 · 💻 cs.AR

Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs

classification 💻 cs.AR
keywords architecturelinearmatrixmultiplicationacceleratingarrayextensionlarge-scale
0
0 comments X
read the original abstract

Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we propose a work-stealing scheme to ensure the equality in the workload partition among multiple linear arrays. Furthermore, an analytical model is developed to determine the optimal design parameters. Experiments on a real-life convolutional neural network (CNN) show that we can obtain the optimal extension of the linear array architecture.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.