DORA is an instruction-based DNN accelerator architecture with a two-stage compilation framework that delivers stable efficiency across varied workloads and up to 5x throughput gains versus prior accelerators on FPGA.
A unified framework for automated code transformation and pragma insertion
7 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 7representative citing papers
DAE4HLS enables explicit decoupling of access and execute in HLS to unlock memory-level parallelism, delivering 10-79x speedups for complex workloads on commercial and dynamic HLS tools.
CODO automates comprehensive dataflow optimization on FPGAs, achieving 1.45x-4.52x speedups on kernels and up to 33.8x on DNN models over state-of-the-art frameworks.
A distributed arithmetic algorithm for CMVM operations on FPGAs reduces area by up to one third and latency for quantized neural networks, integrated into hls4ml.
FILCO introduces a real-time reconfigurable composing architecture for DNN acceleration that achieves 1.3x-5x better throughput and hardware efficiency than prior designs on diverse workloads via an analytical model and two-stage design space exploration.
Physics-informed GNNs with four detector-aware graph constructions and a custom message passing layer achieve MAE 0.8525 for pT estimation on CMS trigger data with over 55% fewer parameters than baselines.
Systematic study concludes overlay architectures suit frequent model switching in current autonomous driving setups, while customized ones may become preferable as bitstream reload overhead decreases.
citing papers explorer
-
DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration
DORA is an instruction-based DNN accelerator architecture with a two-stage compilation framework that delivers stable efficiency across varied workloads and up to 5x throughput gains versus prior accelerators on FPGA.
-
DAE4HLS: Exposing Memory-Level Parallelism for High-Level Synthesis using Explicit Decoupling
DAE4HLS enables explicit decoupling of access and execute in HLS to unlock memory-level parallelism, delivering 10-79x speedups for complex workloads on commercial and dynamic HLS tools.
-
CODO: An Automated Compiler for Comprehensive Dataflow Optimization
CODO automates comprehensive dataflow optimization on FPGAs, achieving 1.45x-4.52x speedups on kernels and up to 33.8x on DNN models over state-of-the-art frameworks.
-
da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs
A distributed arithmetic algorithm for CMVM operations on FPGAs reduces area by up to one third and latency for quantized neural networks, integrated into hls4ml.
-
FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration
FILCO introduces a real-time reconfigurable composing architecture for DNN acceleration that achieves 1.3x-5x better throughput and hardware efficiency than prior designs on diverse workloads via an analytical model and two-stage design space exploration.
-
Physics-Informed Graph Neural Networks for Transverse Momentum Estimation in CMS Trigger Systems
Physics-informed GNNs with four detector-aware graph constructions and a custom message passing layer achieve MAE 0.8525 for pT estimation on CMS trigger data with over 55% fewer parameters than baselines.
-
To Overlay or to Customize? Revisiting Architectural Choices in Heterogeneous Systems
Systematic study concludes overlay architectures suit frequent model switching in current autonomous driving setups, while customized ones may become preferable as bitstream reload overhead decreases.