NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic

Amirhossein Esmaili; Arash Fayyazi; Atharva Khare; Mahdi Nazemi; Massoud Pedram; Soheil Nazar Shahsavani

arxiv: 2104.05421 · v1 · pith:JIS64XXCnew · submitted 2021-04-07 · 💻 cs.LG · cs.AI

NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic

Mahdi Nazemi , Arash Fayyazi , Amirhossein Esmaili , Atharva Khare , Soheil Nazar Shahsavani , Massoud Pedram This is my paper

classification 💻 cs.LG cs.AI

keywords processingultra-low-latencyacceleratorsdesignfpgalatencylogiclower

0 comments

read the original abstract

While there is a large body of research on efficient processing of deep neural networks (DNNs), ultra-low-latency realization of these models for applications with stringent, sub-microsecond latency requirements continues to be an unresolved, challenging problem. Field-programmable gate array (FPGA)-based DNN accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms considering their performance, flexibility, and energy efficiency. This paper presents NullaNet Tiny, an across-the-stack design and optimization framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators. The key idea is to replace expensive operations required to compute various filter/neuron functions in a DNN with Boolean logic expressions that are mapped to the native look-up tables (LUTs) of the FPGA device (examples of such operations are multiply-and-accumulate and batch normalization). At about the same level of classification accuracy, compared to Xilinx's LogicNets, our design achieves 2.36$\times$ lower latency and 24.42$\times$ lower LUT utilization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems
cs.AR 2026-05 unverdicted novelty 6.0

Novel grouped-convolution block and tuning algorithm for precomputed 1D-CNNs enables 95% F1 atrial-fibrillation detection on MIT-BIH ECG using 2844 LUTs on AMD Spartan-7 S15 with zero DSPs or BRAM.