A unified framework for automated code transformation and pragma insertion

Jinming Zhuang, Shaojie Xiang, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, Peipei Zhou · 2025 · arXiv 6628.370887

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

representative citing papers

DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration

cs.AR · 2026-05-22 · unverdicted · novelty 6.0

DORA is an instruction-based DNN accelerator architecture with a two-stage compilation framework that delivers stable efficiency across varied workloads and up to 5x throughput gains versus prior accelerators on FPGA.

DAE4HLS: Exposing Memory-Level Parallelism for High-Level Synthesis using Explicit Decoupling

cs.AR · 2026-05-22 · unverdicted · novelty 6.0

DAE4HLS enables explicit decoupling of access and execute in HLS to unlock memory-level parallelism, delivering 10-79x speedups for complex workloads on commercial and dynamic HLS tools.

CODO: An Automated Compiler for Comprehensive Dataflow Optimization

cs.AR · 2026-04-14 · unverdicted · novelty 6.0 · 2 refs

CODO automates comprehensive dataflow optimization on FPGAs, achieving 1.45x-4.52x speedups on kernels and up to 33.8x on DNN models over state-of-the-art frameworks.

da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

cs.AR · 2025-07-06 · unverdicted · novelty 6.0

A distributed arithmetic algorithm for CMVM operations on FPGAs reduces area by up to one third and latency for quantized neural networks, integrated into hls4ml.

FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration

cs.AR · 2026-04-08 · unverdicted · novelty 5.0

FILCO introduces a real-time reconfigurable composing architecture for DNN acceleration that achieves 1.3x-5x better throughput and hardware efficiency than prior designs on diverse workloads via an analytical model and two-stage design space exploration.

Physics-Informed Graph Neural Networks for Transverse Momentum Estimation in CMS Trigger Systems

cs.LG · 2025-07-25 · unverdicted · novelty 5.0

Physics-informed GNNs with four detector-aware graph constructions and a custom message passing layer achieve MAE 0.8525 for pT estimation on CMS trigger data with over 55% fewer parameters than baselines.

To Overlay or to Customize? Revisiting Architectural Choices in Heterogeneous Systems

cs.AR · 2026-05-22 · unverdicted · novelty 3.0

Systematic study concludes overlay architectures suit frequent model switching in current autonomous driving setups, while customized ones may become preferable as bitstream reload overhead decreases.

citing papers explorer

Showing 7 of 7 citing papers.

DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration cs.AR · 2026-05-22 · unverdicted · none · ref 64
DORA is an instruction-based DNN accelerator architecture with a two-stage compilation framework that delivers stable efficiency across varied workloads and up to 5x throughput gains versus prior accelerators on FPGA.
DAE4HLS: Exposing Memory-Level Parallelism for High-Level Synthesis using Explicit Decoupling cs.AR · 2026-05-22 · unverdicted · none · ref 3
DAE4HLS enables explicit decoupling of access and execute in HLS to unlock memory-level parallelism, delivering 10-79x speedups for complex workloads on commercial and dynamic HLS tools.
CODO: An Automated Compiler for Comprehensive Dataflow Optimization cs.AR · 2026-04-14 · unverdicted · none · ref 8 · 2 links
CODO automates comprehensive dataflow optimization on FPGAs, achieving 1.45x-4.52x speedups on kernels and up to 33.8x on DNN models over state-of-the-art frameworks.
da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs cs.AR · 2025-07-06 · unverdicted · none · ref 18
A distributed arithmetic algorithm for CMVM operations on FPGAs reduces area by up to one third and latency for quantized neural networks, integrated into hls4ml.
FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration cs.AR · 2026-04-08 · unverdicted · none · ref 40
FILCO introduces a real-time reconfigurable composing architecture for DNN acceleration that achieves 1.3x-5x better throughput and hardware efficiency than prior designs on diverse workloads via an analytical model and two-stage design space exploration.
Physics-Informed Graph Neural Networks for Transverse Momentum Estimation in CMS Trigger Systems cs.LG · 2025-07-25 · unverdicted · none · ref 8
Physics-informed GNNs with four detector-aware graph constructions and a custom message passing layer achieve MAE 0.8525 for pT estimation on CMS trigger data with over 55% fewer parameters than baselines.
To Overlay or to Customize? Revisiting Architectural Choices in Heterogeneous Systems cs.AR · 2026-05-22 · unverdicted · none · ref 58
Systematic study concludes overlay architectures suit frequent model switching in current autonomous driving setups, while customized ones may become preferable as bitstream reload overhead decreases.

A unified framework for automated code transformation and pragma insertion

fields

years

verdicts

representative citing papers

citing papers explorer