A distributed arithmetic algorithm for CMVM operations on FPGAs reduces area by up to one third and latency for quantized neural networks, integrated into hls4ml.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AR 2verdicts
UNVERDICTED 2representative citing papers
Hybrid FPGA-AI Engine deployment of a dynamic GNN for Belle II trigger achieves 2.94M events/s throughput at 7.15us latency with 53% better throughput and DSP usage reduced from 99% to 19%.
citing papers explorer
-
da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs
A distributed arithmetic algorithm for CMVM operations on FPGAs reduces area by up to one third and latency for quantized neural networks, integrated into hls4ml.
-
Reconfigurable Computing Challenge: Real-Time Graph Neural Networks for Online Event Selection in Big Science
Hybrid FPGA-AI Engine deployment of a dynamic GNN for Belle II trigger achieves 2.94M events/s throughput at 7.15us latency with 53% better throughput and DSP usage reduced from 99% to 19%.