Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference

Alessio Burrello; Daniele Jahier Pagliari; Enrico Macii; Giuseppe Maria Sarda; Luca Benini; Marian Verhelst; Massimo Poncino; Matteo Risso

arxiv: 2306.05060 · v1 · pith:B5E3BMMEnew · submitted 2023-06-08 · 💻 cs.LG

Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference

Matteo Risso , Alessio Burrello , Giuseppe Maria Sarda , Luca Benini , Enrico Macii , Massimo Poncino , Marian Verhelst , Daniele Jahier Pagliari This is my paper

classification 💻 cs.LG

keywords latencyenergyaccuracyacceleratorsedgeheterogeneousinferencemulti-accelerator

0 comments

read the original abstract

The need to execute Deep Neural Networks (DNNs) at low latency and low power at the edge has spurred the development of new heterogeneous Systems-on-Chips (SoCs) encapsulating a diverse set of hardware accelerators. How to optimally map a DNN onto such multi-accelerator systems is an open problem. We propose ODiMO, a hardware-aware tool that performs a fine-grain mapping across different accelerators on-chip, splitting individual layers and executing them in parallel, to reduce inference energy consumption or latency, while taking into account each accelerator's quantization precision to maintain accuracy. Pareto-optimal networks in the accuracy vs. energy or latency space are pursued for three popular dataset/DNN pairs, and deployed on the DIANA heterogeneous ultra-low power edge AI SoC. We show that ODiMO reduces energy/latency by up to 33%/31% with limited accuracy drop (-0.53%/-0.32%) compared to manual heuristic mappings.

This paper has not been read by Pith yet.

Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference

discussion (0)