PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance

Alex K. Jones; Jinming Zhuang; Peipei Zhou; Sarah Schultz; Shixin Ji; Xingzhen Chen; Yihui Ren; Zheng Dong; Zhuoping Yang

arxiv: 2604.05308 · v1 · submitted 2026-04-07 · 💻 cs.AR

PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance

Shixin Ji , Jinming Zhuang , Sarah Schultz , Zhuoping Yang , Xingzhen Chen , Zheng Dong , Alex K. Jones , Yihui Ren

show 1 more author

Peipei Zhou

This is my paper

Pith reviewed 2026-05-10 19:40 UTC · model grok-4.3

classification 💻 cs.AR

keywords heterogeneous acceleratorsreal-time systemsdesign space explorationschedulability analysispreemption mechanismssafety-critical systemsFIFO schedulingEDF scheduling

0 comments

The pith

PHAROS shows that adding soft real-time schedulability to accelerator design exploration finds workable hardware setups for more task sets than throughput-only methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

PHAROS is a design framework for spatially partitioned heterogeneous accelerators that centers optimization on meeting real-time deadlines rather than maximizing speed. It adds preemption support and scheduler designs for FIFO and EDF policies, then runs a custom design space exploration that uses soft real-time schedulability as its main objective and constraint. Across evaluations on diverse applications the method identifies feasible hardware configurations for a wider range of task sets and achieves better deadline compliance than baselines focused only on throughput. The work also supplies response-time analyses for the supported schedulers. This matters in safety-critical embedded systems where timely execution directly affects overall safety.

Core claim

Through modeling, analysis, and evaluation, PHAROS demonstrates that its soft real-time schedulability-oriented design space exploration discovers more feasible configurations for a broader range of task sets than throughput-oriented DSE baselines while delivering improved real-time performance; the framework introduces preemption mechanisms and scheduler designs for spatially partitioned heterogeneous accelerators under FIFO and EDF policies and provides response-time analyses for those algorithms.

What carries the argument

Soft real-time (SRT) schedulability-oriented design space exploration (DSE) that tailors objectives and constraints to schedulability, paired with preemption mechanisms for spatially partitioned heterogeneous accelerators under FIFO and EDF scheduling.

If this is right

A wider variety of task sets from safety-critical applications can be scheduled without missing deadlines.
Hardware configurations can be selected to provide stronger guarantees on execution predictability.
Designers can optimize accelerator systems specifically for real-time constraints instead of average-case throughput.
Response-time bounds become available for FIFO and EDF schedulers running on these accelerators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same schedulability-driven exploration approach could be extended to hard real-time constraints by strengthening the underlying analyses.
Future accelerator hardware designs might incorporate the preemption features as standard primitives once overheads are quantified.
Similar real-time-aware design space exploration may prove useful for other embedded platforms such as GPUs or reconfigurable fabrics.

Load-bearing premise

The introduced preemption mechanisms and soft real-time schedulability analysis for spatially partitioned heterogeneous accelerators can be realized in hardware without significant unmodeled overheads or interference that would invalidate the deadline guarantees.

What would settle it

A hardware implementation of the preemption mechanisms on a spatially partitioned heterogeneous accelerator, with measurements showing whether task sets predicted as schedulable by the analysis actually meet all deadlines under realistic interference.

Figures

Figures reproduced from arXiv: 2604.05308 by Alex K. Jones, Jinming Zhuang, Peipei Zhou, Sarah Schultz, Shixin Ji, Xingzhen Chen, Yihui Ren, Zheng Dong, Zhuoping Yang.

**Figure 2.** Figure 2: PHAROS hardware architecture. deadline compliance for tasksets with smaller periods, whereas these frameworks fail due to hardware inefficiency. Real-time theories for guiding accelerator design: To ensure deadline compliance, the accelerator design must be paired with real-time theories to analyze schedulability or response time. For a single accelerator, [27] gives the earliest formulation in the HRT sc… view at source ↗

**Figure 3.** Figure 3: Preemption pattern and overhead modeling. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: PHAROS design space exploration. of computing a tile, storing, and loading the buffers are fixed in one accelerator and are only related to its design parameters: 𝜉 𝑘 𝑖 = 𝑒 𝑘 𝑡𝑖𝑙𝑒 + 𝑒 𝑘 𝑠𝑡𝑜𝑟𝑒 + 𝑒 𝑘 𝑙𝑜𝑎𝑑 (5) Where 𝑒 𝑘 𝑡𝑖𝑙𝑒 ,𝑒 𝑘 𝑠𝑡𝑜𝑟𝑒 , and 𝑒 𝑘 𝑙𝑜𝑎𝑑 are functions of 𝐴 𝑘 , ..., 𝑍𝑘 . Specifically, when this accelerator is skipped (𝑏 𝑘 𝑖 = 0), the 𝑒 𝑘 𝑖 is also 0. The preemption only happens when EDF scheduling… view at source ↗

**Figure 5.** Figure 5: Beam search progress. thus reducing the period results in a higher maximum speed in the system. When scaling down all periods proportionally to x%, the utilization of all accelerators will increase to 1 𝑥% . Thus, the potential of one system to further scale down the periods without 𝑢 > 1 is determined by the maximum utilization of the accelerators. 4.2 Beam Search Heuristics One significant challenge in P… view at source ↗

**Figure 6.** Figure 6: SRT-schedulable taskset design points of SRT-guided (SG) and throughput-guided (TG) EDF using FIFO and EDF [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 9.** Figure 9: Search time comparison of various beam search se [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 8.** Figure 8: The response time statistics of FIFO vs. EDF sched [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

read the original abstract

Spatially partitioned heterogeneous accelerators (HAs) are increasingly adopted in embedded systems for their performance and flexibility. Yet most existing HA design frameworks optimize primarily for throughput or quality-of-service (QoS) metrics. They often overlook safety-critical real-time requirements, including hardware support for predictable execution, real-time-aware design space exploration (DSE), and rigorous schedulability analysis. These requirements are essential in safety-critical applications such as smart transportation, where schedulability guarantees directly affect system safety. To address this gap, we present PHAROS, a real-time-centric HA design framework. PHAROS introduces preemption mechanisms and scheduler designs for spatially partitioned HAs under first-in-first-out (FIFO) and earliest-deadline-first (EDF) policies. Leveraging modern real-time theory, we further develop a soft real-time (SRT) schedulability-oriented DSE with objectives and constraints tailored to SRT schedulability. Through comprehensive modeling, analysis, and evaluation across diverse applications, we show that PHAROS's DSE discovers more feasible configurations for a broader range of task sets than throughput-oriented DSE baselines while delivering improved real-time performance. We also provide response-time analyses for the supported scheduling algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PHAROS adds preemption mechanisms and an SRT-focused DSE to spatially partitioned heterogeneous accelerator design, but the response-time analyses risk being optimistic if hardware overheads are not tightly bounded.

read the letter

The main takeaway is that this paper introduces PHAROS, a framework that brings hardware preemption support for FIFO and EDF scheduling plus a schedulability-oriented design space exploration to spatially partitioned heterogeneous accelerators. It targets safety-critical embedded systems where missing deadlines has direct safety implications, such as in smart transportation applications. The abstract positions this as filling a gap left by throughput- or QoS-centric HA design tools. That framing is reasonable given the cited real-time theory it builds on. The concrete mechanisms for preemption and the tailored DSE objectives are the actual new elements here. The paper does a solid job of spelling out why existing frameworks fall short on predictable execution and rigorous analysis, then mapping modern SRT scheduling ideas onto the HA setting. If the modeling and evaluation hold, the claim that it finds more feasible task-set configurations than throughput baselines is a useful practical result for designers who must balance performance with deadline compliance. The soft spot sits in the response-time analyses for the new preemption mechanisms. Standard partitioned scheduling analysis often assumes bounded or negligible preemption costs, but accelerators introduce variable latencies from pipeline state, partial reconfiguration, and cross-partition memory contention. The abstract states that the analyses support improved real-time performance, yet without explicit bounds or measurements of those overheads in the evaluation, the feasibility results could prove optimistic once implemented in silicon. The stress-test concern lands here: if those costs are data-dependent or unmodeled, the DSE-selected configurations may miss deadlines even when the math says they are feasible. This paper is aimed at hardware and real-time systems researchers who work on embedded accelerators for deadline-driven domains. Readers looking for scheduler designs or DSE methods that incorporate SRT constraints would find the concrete proposals worth examining. It shows clear engagement with the literature and deserves a serious referee because the problem is well-motivated and the framework is specific enough to review in detail. I would recommend sending it out for peer review so that experts can check the tightness of the analyses and any implementation artifacts.

Referee Report

1 major / 2 minor

Summary. The paper presents PHAROS, a real-time-centric framework for designing spatially partitioned heterogeneous accelerators (HAs). It introduces hardware preemption mechanisms and scheduler designs supporting FIFO and EDF policies, develops an SRT schedulability-oriented design space exploration (DSE) with tailored objectives and constraints, and supplies response-time analyses for the supported algorithms. The central claim is that PHAROS's DSE identifies more feasible task-set configurations across a broader range of applications than throughput-oriented DSE baselines while delivering improved real-time performance and deadline compliance.

Significance. If the response-time analyses are sound and the DSE evaluations demonstrate the claimed advantages without unmodeled overheads, the work would be significant for safety-critical embedded systems. It bridges HA design with modern real-time theory to enable predictable execution on flexible accelerators, directly addressing gaps in current frameworks that prioritize throughput over schedulability guarantees.

major comments (1)

[Response-time analyses] Response-time analyses section: the SRT analyses for FIFO/EDF preemption on spatially partitioned HAs must explicitly bound or prove negligible all hardware-specific overheads (context save/restore, partial reconfiguration latency, pipeline state, and cross-partition memory contention); without this, the computed response times are optimistic and the DSE feasibility results cannot guarantee deadline compliance in silicon.

minor comments (2)

[Abstract] Abstract: the claim of 'comprehensive modeling, analysis, and evaluation across diverse applications' is stated without any quantitative summary (e.g., number of task sets, improvement percentages, or feasibility ratios); a concise results highlight would improve clarity.
[Evaluation] Evaluation: ensure all reported comparisons include the specific task-set parameters, number of runs, and any statistical measures so that the superiority over throughput baselines can be independently assessed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for major revision. The feedback highlights an important aspect of ensuring the soundness of our response-time analyses for safety-critical use. We address the major comment below and commit to the necessary revisions.

read point-by-point responses

Referee: [Response-time analyses] Response-time analyses section: the SRT analyses for FIFO/EDF preemption on spatially partitioned HAs must explicitly bound or prove negligible all hardware-specific overheads (context save/restore, partial reconfiguration latency, pipeline state, and cross-partition memory contention); without this, the computed response times are optimistic and the DSE feasibility results cannot guarantee deadline compliance in silicon.

Authors: We agree that explicit bounds on hardware-specific overheads are required for the analyses to be non-optimistic and to support deadline-compliance guarantees. Our current response-time analyses for FIFO and EDF on spatially partitioned HAs model preemption costs and pipeline behavior at a high level, but we acknowledge that we have not derived or stated explicit upper bounds for all listed items (partial reconfiguration latency, pipeline state, and cross-partition memory contention). In the revised manuscript we will augment the Response-time analyses section with these bounds, using the hardware parameters already defined in the PHAROS model, or provide arguments for their negligibility under the stated assumptions. This change will directly strengthen the link between the DSE feasibility results and practical deadline compliance. revision: yes

Circularity Check

0 steps flagged

No significant circularity in PHAROS derivation chain

full rationale

The paper introduces hardware preemption mechanisms for spatially partitioned HAs and develops SRT schedulability-oriented DSE plus response-time analyses by extending cited modern real-time theory. No load-bearing step reduces by construction to a fitted input, self-definition, or self-citation chain; the modeling, constraints, and evaluation across applications remain independent of the target claims. This is the expected non-finding for a framework paper that adds HA-specific mechanisms on top of established schedulability results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; no explicit free parameters or invented entities are named, but the framework rests on domain assumptions about hardware modeling and real-time theory applicability.

axioms (2)

domain assumption Spatially partitioned heterogeneous accelerators can support preemption mechanisms under FIFO and EDF scheduling policies.
Foundational to the introduced scheduler designs and preemption support.
domain assumption Soft real-time schedulability metrics can serve as primary objectives and constraints for design space exploration of accelerators.
Directly stated as the basis for the SRT schedulability-oriented DSE.

pith-pipeline@v0.9.0 · 5544 in / 1466 out tokens · 77684 ms · 2026-05-10T19:40:42.964653+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formulate an SRT-oriented DSE, selecting maximum utilization as the optimization objective... the system is SRT-schedulable if and only if the utilization of every accelerator does not exceed 1
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PHAROS introduces preemption mechanisms and scheduler designs for spatially partitioned HAs under first-in-first-out (FIFO) and earliest-deadline-first (EDF) policies

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

Hyoukjun Kwon, Liangzhen Lai, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, and Vikas Chandra. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 71–83, 2021

work page 2021
[2]

Dally, Yatish Turakhia, and Song Han

William J. Dally, Yatish Turakhia, and Song Han. Domain-specific hardware accelerators.Commun. ACM, 63(7):48–57, June 2020

work page 2020
[3]

CHARM: Composing Heterogeneous AcceleR- ators for Matrix Multiply on Versal ACAP Architecture

Jinming Zhuang, Jason Lau, Hanchen Ye, Zhuoping Yang, Yubo Du, Jack Lo, Kristof Denolf, Stephen Neuendorffer, Alex Jones, Jingtong Hu, Deming Chen, Jason Cong, and Peipei Zhou. CHARM: Composing Heterogeneous AcceleR- ators for Matrix Multiply on Versal ACAP Architecture. InProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gat...

work page 2023
[4]

PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units

Yujeong Choi and Minsoo Rhu. PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units. In2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 220–233, 2020

work page 2020
[5]

Anderson

Zheng Dong, Cong Liu, Alan Gatherer, Lee McFearin, Peter Yan, and James H. Anderson. Optimal Dataflow Scheduling on a Heterogeneous Multiprocessor With Reduced Response Time Bounds. In Marko Bertogna, editor,29th Euromicro Conference on Real-Time Systems, ECRTS 2017, Dubrovnik, Croatia, June 27-30, 2017, LIPIcs, pages 15:1–15:22. Schloss Dagstuhl - Leibni...

work page 2017
[6]

Jones, Jingtong Hu, Yiyu Shi, and Peipei Zhou

Jinming Zhuang, Zhuoping Yang, Shixin Ji, Heng Huang, Alex K. Jones, Jingtong Hu, Yiyu Shi, and Peipei Zhou. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration. InProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’24, page 55–66, New York, NY, USA, 2024. A...

work page 2024
[7]

Stream-HLS: Towards Automatic Dataflow Acceleration

Suhail Basalama and Jason Cong. Stream-HLS: Towards Automatic Dataflow Acceleration. InProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 103–114, 2025

work page 2025
[8]

Fixed-point FPGA Implementation of the FFT Accumulation Method for Real-time Cyclostationary Analysis.ACM Transactions on Reconfig- urable Technology and Systems, 16(3):1–28, 2023

Carol Jingyi Li, Xiangwei Li, Binglei Lou, Craig T Jin, David Boland, and Philip HW Leong. Fixed-point FPGA Implementation of the FFT Accumulation Method for Real-time Cyclostationary Analysis.ACM Transactions on Reconfig- urable Technology and Systems, 16(3):1–28, 2023

work page 2023
[9]

Jones, Yiyu Shi, Yanzhi Wang, and Peipei Zhou

Peiyan Dong, Jinming Zhuang, Zhuoping Yang, Shixin Ji, Yanyu Li, Dongkuan Xu, Heng Huang, Jingtong Hu, Alex K. Jones, Yiyu Shi, Yanzhi Wang, and Peipei Zhou. EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real- Time Vision Transformer Inference on Versal ACAP Architecture.IEEE Transac- tions on Computer-Aided Design of Integrated Circ...

work page 2024
[10]

AIM: Accelerating Arbitrary-precision Integer Multiplication on Hetero- geneous Reconfigurable Computing Platform Versal ACAP

Zhuoping Yang, Jinming Zhuang, Jiaqi Yin, Cunxi Yu, Alex K Jones, and Peipei Zhou. AIM: Accelerating Arbitrary-precision Integer Multiplication on Hetero- geneous Reconfigurable Computing Platform Versal ACAP. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 1–9. IEEE, 2023

work page 2023
[11]

ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines

Jinming Zhuang, Shaojie Xiang, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, and Peipei Zhou. ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines. InProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’25, 2024

work page 2025
[12]

DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator

Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator. InProceedings of the 39th International Conference on Computer-Aided Design, ICCAD ’20, New York, NY, USA, 2020. Association for Computing Machinery

work page 2020
[13]

Inter- layer Scheduling Space Definition and Exploration for Tiled Accelerators

Jingwei Cai, Yuchen Wei, Zuotong Wu, Sen Peng, and Kaisheng Ma. Inter- layer Scheduling Space Definition and Exploration for Tiled Accelerators. In Proceedings of the 50th Annual International Symposium on Computer Architecture, ISCA ’23, New York, NY, USA, 2023. Association for Computing Machinery

work page 2023
[14]

DREAM: A Dynamic Scheduler for Dynamic Real- time Multi-model ML Workloads

Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen, Liangzhen Lai, and Vikas Chandra. DREAM: A Dynamic Scheduler for Dynamic Real- time Multi-model ML Workloads. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, pages 73–86, 2023

work page 2023
[15]

Planaria: Dy- namic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmen- dra Reddy Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, and Hadi Esmaeilzadeh. Planaria: Dy- namic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks. In2020 53rd Annual IEEE/ACM International Symposium...

work page 2020
[16]

CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators

Chunyang Wang, Yuebin Bai, and Desen Sun. CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators. IEEE Transactions on Parallel and Distributed Systems, 34(7):2091–2106, 2023

work page 2091
[17]

Layer-Puzzle: Allocating and Scheduling Multi-task on Multi-core NPUs by Using Layer Heterogeneity

Chengsi Gao, Ying Wang, Cheng Liu, Mengdi Wang, Weiwei Chen, Yinhe Han, and Lei Zhang. Layer-Puzzle: Allocating and Scheduling Multi-task on Multi-core NPUs by Using Layer Heterogeneity. In2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–6. IEEE, 2023

work page 2023
[18]

Oh, Seonghak Kim, Yunho Jin, Sam Son, Jonghyun Bae, Jongsung Lee, Yeonhong Park, Dong Uk Kim, Tae Jun Ham, and Jae W

Young H. Oh, Seonghak Kim, Yunho Jin, Sam Son, Jonghyun Bae, Jongsung Lee, Yeonhong Park, Dong Uk Kim, Tae Jun Ham, and Jae W. Lee. Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 584–597, 2021

work page 2021
[19]

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanović, Borivoje Nikolić, and Yakun Sophia Shao. MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks. In2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 828–841, 2023

work page 2023
[20]

Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.IEEE Transactions on Computers, 72(5):1314–1328, 2023

Shulin Zeng, Guohao Dai, Niansong Zhang, Xinhao Yang, Haoyu Zhang, Zhen- hua Zhu, Huazhong Yang, and Yu Wang. Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.IEEE Transactions on Computers, 72(5):1314–1328, 2023

work page 2023
[21]

Time-Predictable Acceleration of Deep Neural Networks on FPGA SoC Platforms

Francesco Restuccia and Alessandro Biondi. Time-Predictable Acceleration of Deep Neural Networks on FPGA SoC Platforms. In2021 IEEE Real-Time Systems Symposium (RTSS), pages 441–454, 2021

work page 2021
[22]

AMD Vitis™AI Software

AMD. AMD Vitis™AI Software

work page
[23]

MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs

Jiapeng Guan, Ran Wei, Dean You, Yingquan Wang, Ruizhe Yang, Hui Wang, and Zhe Jiang. MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs. In2024 IEEE Real-Time Systems Symposium (RTSS), pages 1–14. IEEE, 2024

work page 2024
[24]

Jones, Zheng Dong, and Peipei Zhou

Shixin Ji, Xingzhen Chen, Wei Zhang, Zhuoping Yang, Jinming Zhuang, Sarah Schultz, Yukai Song, Jingtong Hu, Alex K. Jones, Zheng Dong, and Peipei Zhou. Towards Accelerator Customization in Real-time Safety-critical Systems. InPro- ceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’25, page 181, New York, NY, US...

work page 2025
[25]

ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems

Shixin Ji, Xingzhen Chen, Jinming Zhuang, Wei Zhang, Zhuoping Yang, Sarah Schultz, Yukai Song, Jingtong Hu, Alex Jones, Zheng Dong, and Peipei Zhou. ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems. InProceedings of the Great Lakes Symposium on VLSI 2025, GLSVLSI ’25, page 442–449, New York, NY, USA, 2025. Association for Co...

work page 2025
[26]

Jones, Zheng Dong, and Peipei Zhou

Shixin Ji, Zhuoping Yang, Xingzhen Chen, Wei Zhang, Jinming Zhuang, Alex K. Jones, Zheng Dong, and Peipei Zhou. DERCA: DetERministic Cycle-Level Accel- erator on Reconfigurable Platforms in DNN-Enabled Real-Time Safety-Critical Systems. In2025 IEEE Real-Time Systems Symposium (RTSS), pages 392–405, 2025

work page 2025
[27]

C. L. Liu and James W. Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment.J. ACM, 20(1):46–61, January 1973

work page 1973
[28]

Devi and James H

UmaMaheswari C. Devi and James H. Anderson. Tardiness Bounds under Global EDF Scheduling on a Multiprocessor. InProceedings of the 26th IEEE Interna- tional Real-Time Systems Symposium, RTSS ’05, page 330–341, USA, 2005. IEEE Computer Society

work page 2005
[29]

BlueFace: Integrating an Accelerator into the Core’s Pipeline through Algorithm-Interface Co-Design for Real-Time SoCs

Zhe Jiang, Nathan Fisher, Nan Guan, and Zheng Dong. BlueFace: Integrating an Accelerator into the Core’s Pipeline through Algorithm-Interface Co-Design for Real-Time SoCs. In2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2023

work page 2023
[30]

Schedulability Analysis for Coscheduling Real- Time Tasks on Multiprocessors.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(11):4721–4732, 2022

Zheng Dong and Cong Liu. Schedulability Analysis for Coscheduling Real- Time Tasks on Multiprocessors.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(11):4721–4732, 2022

work page 2022
[31]

Qi, Hao Su, Kaichun Mo, and Leonidas J

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

work page 2017
[32]

Point transformer v3: Simpler faster stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4840–4851, 2024

work page 2024
[33]

MLP-Mixer: An all-MLP Archi- tecture for Vision

Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, and Alexey Dosovitskiy. MLP-Mixer: An all-MLP Archi- tecture for Vision. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural...

work page 2021
[34]

ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):5314–5321, 2023

Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, and Hervé Jégou. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):5314–5321, 2023

work page 2023
[35]

Training data-efficient image transformers & distillation through attention

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. Training data-efficient image transformers & distillation through attention. In Marina Meila and Tong Zhang, editors,Pro- ceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 103...

work page 2021

[1] [1]

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

Hyoukjun Kwon, Liangzhen Lai, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, and Vikas Chandra. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 71–83, 2021

work page 2021

[2] [2]

Dally, Yatish Turakhia, and Song Han

William J. Dally, Yatish Turakhia, and Song Han. Domain-specific hardware accelerators.Commun. ACM, 63(7):48–57, June 2020

work page 2020

[3] [3]

CHARM: Composing Heterogeneous AcceleR- ators for Matrix Multiply on Versal ACAP Architecture

Jinming Zhuang, Jason Lau, Hanchen Ye, Zhuoping Yang, Yubo Du, Jack Lo, Kristof Denolf, Stephen Neuendorffer, Alex Jones, Jingtong Hu, Deming Chen, Jason Cong, and Peipei Zhou. CHARM: Composing Heterogeneous AcceleR- ators for Matrix Multiply on Versal ACAP Architecture. InProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gat...

work page 2023

[4] [4]

PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units

Yujeong Choi and Minsoo Rhu. PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units. In2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 220–233, 2020

work page 2020

[5] [5]

Anderson

Zheng Dong, Cong Liu, Alan Gatherer, Lee McFearin, Peter Yan, and James H. Anderson. Optimal Dataflow Scheduling on a Heterogeneous Multiprocessor With Reduced Response Time Bounds. In Marko Bertogna, editor,29th Euromicro Conference on Real-Time Systems, ECRTS 2017, Dubrovnik, Croatia, June 27-30, 2017, LIPIcs, pages 15:1–15:22. Schloss Dagstuhl - Leibni...

work page 2017

[6] [6]

Jones, Jingtong Hu, Yiyu Shi, and Peipei Zhou

Jinming Zhuang, Zhuoping Yang, Shixin Ji, Heng Huang, Alex K. Jones, Jingtong Hu, Yiyu Shi, and Peipei Zhou. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration. InProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’24, page 55–66, New York, NY, USA, 2024. A...

work page 2024

[7] [7]

Stream-HLS: Towards Automatic Dataflow Acceleration

Suhail Basalama and Jason Cong. Stream-HLS: Towards Automatic Dataflow Acceleration. InProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 103–114, 2025

work page 2025

[8] [8]

Fixed-point FPGA Implementation of the FFT Accumulation Method for Real-time Cyclostationary Analysis.ACM Transactions on Reconfig- urable Technology and Systems, 16(3):1–28, 2023

Carol Jingyi Li, Xiangwei Li, Binglei Lou, Craig T Jin, David Boland, and Philip HW Leong. Fixed-point FPGA Implementation of the FFT Accumulation Method for Real-time Cyclostationary Analysis.ACM Transactions on Reconfig- urable Technology and Systems, 16(3):1–28, 2023

work page 2023

[9] [9]

Jones, Yiyu Shi, Yanzhi Wang, and Peipei Zhou

Peiyan Dong, Jinming Zhuang, Zhuoping Yang, Shixin Ji, Yanyu Li, Dongkuan Xu, Heng Huang, Jingtong Hu, Alex K. Jones, Yiyu Shi, Yanzhi Wang, and Peipei Zhou. EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real- Time Vision Transformer Inference on Versal ACAP Architecture.IEEE Transac- tions on Computer-Aided Design of Integrated Circ...

work page 2024

[10] [10]

AIM: Accelerating Arbitrary-precision Integer Multiplication on Hetero- geneous Reconfigurable Computing Platform Versal ACAP

Zhuoping Yang, Jinming Zhuang, Jiaqi Yin, Cunxi Yu, Alex K Jones, and Peipei Zhou. AIM: Accelerating Arbitrary-precision Integer Multiplication on Hetero- geneous Reconfigurable Computing Platform Versal ACAP. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 1–9. IEEE, 2023

work page 2023

[11] [11]

ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines

Jinming Zhuang, Shaojie Xiang, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, and Peipei Zhou. ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines. InProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’25, 2024

work page 2025

[12] [12]

DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator

Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator. InProceedings of the 39th International Conference on Computer-Aided Design, ICCAD ’20, New York, NY, USA, 2020. Association for Computing Machinery

work page 2020

[13] [13]

Inter- layer Scheduling Space Definition and Exploration for Tiled Accelerators

Jingwei Cai, Yuchen Wei, Zuotong Wu, Sen Peng, and Kaisheng Ma. Inter- layer Scheduling Space Definition and Exploration for Tiled Accelerators. In Proceedings of the 50th Annual International Symposium on Computer Architecture, ISCA ’23, New York, NY, USA, 2023. Association for Computing Machinery

work page 2023

[14] [14]

DREAM: A Dynamic Scheduler for Dynamic Real- time Multi-model ML Workloads

Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen, Liangzhen Lai, and Vikas Chandra. DREAM: A Dynamic Scheduler for Dynamic Real- time Multi-model ML Workloads. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, pages 73–86, 2023

work page 2023

[15] [15]

Planaria: Dy- namic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmen- dra Reddy Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, and Hadi Esmaeilzadeh. Planaria: Dy- namic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks. In2020 53rd Annual IEEE/ACM International Symposium...

work page 2020

[16] [16]

CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators

Chunyang Wang, Yuebin Bai, and Desen Sun. CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators. IEEE Transactions on Parallel and Distributed Systems, 34(7):2091–2106, 2023

work page 2091

[17] [17]

Layer-Puzzle: Allocating and Scheduling Multi-task on Multi-core NPUs by Using Layer Heterogeneity

Chengsi Gao, Ying Wang, Cheng Liu, Mengdi Wang, Weiwei Chen, Yinhe Han, and Lei Zhang. Layer-Puzzle: Allocating and Scheduling Multi-task on Multi-core NPUs by Using Layer Heterogeneity. In2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–6. IEEE, 2023

work page 2023

[18] [18]

Oh, Seonghak Kim, Yunho Jin, Sam Son, Jonghyun Bae, Jongsung Lee, Yeonhong Park, Dong Uk Kim, Tae Jun Ham, and Jae W

Young H. Oh, Seonghak Kim, Yunho Jin, Sam Son, Jonghyun Bae, Jongsung Lee, Yeonhong Park, Dong Uk Kim, Tae Jun Ham, and Jae W. Lee. Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 584–597, 2021

work page 2021

[19] [19]

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanović, Borivoje Nikolić, and Yakun Sophia Shao. MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks. In2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 828–841, 2023

work page 2023

[20] [20]

Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.IEEE Transactions on Computers, 72(5):1314–1328, 2023

Shulin Zeng, Guohao Dai, Niansong Zhang, Xinhao Yang, Haoyu Zhang, Zhen- hua Zhu, Huazhong Yang, and Yu Wang. Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.IEEE Transactions on Computers, 72(5):1314–1328, 2023

work page 2023

[21] [21]

Time-Predictable Acceleration of Deep Neural Networks on FPGA SoC Platforms

Francesco Restuccia and Alessandro Biondi. Time-Predictable Acceleration of Deep Neural Networks on FPGA SoC Platforms. In2021 IEEE Real-Time Systems Symposium (RTSS), pages 441–454, 2021

work page 2021

[22] [22]

AMD Vitis™AI Software

AMD. AMD Vitis™AI Software

work page

[23] [23]

MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs

Jiapeng Guan, Ran Wei, Dean You, Yingquan Wang, Ruizhe Yang, Hui Wang, and Zhe Jiang. MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs. In2024 IEEE Real-Time Systems Symposium (RTSS), pages 1–14. IEEE, 2024

work page 2024

[24] [24]

Jones, Zheng Dong, and Peipei Zhou

Shixin Ji, Xingzhen Chen, Wei Zhang, Zhuoping Yang, Jinming Zhuang, Sarah Schultz, Yukai Song, Jingtong Hu, Alex K. Jones, Zheng Dong, and Peipei Zhou. Towards Accelerator Customization in Real-time Safety-critical Systems. InPro- ceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’25, page 181, New York, NY, US...

work page 2025

[25] [25]

ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems

Shixin Ji, Xingzhen Chen, Jinming Zhuang, Wei Zhang, Zhuoping Yang, Sarah Schultz, Yukai Song, Jingtong Hu, Alex Jones, Zheng Dong, and Peipei Zhou. ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems. InProceedings of the Great Lakes Symposium on VLSI 2025, GLSVLSI ’25, page 442–449, New York, NY, USA, 2025. Association for Co...

work page 2025

[26] [26]

Jones, Zheng Dong, and Peipei Zhou

Shixin Ji, Zhuoping Yang, Xingzhen Chen, Wei Zhang, Jinming Zhuang, Alex K. Jones, Zheng Dong, and Peipei Zhou. DERCA: DetERministic Cycle-Level Accel- erator on Reconfigurable Platforms in DNN-Enabled Real-Time Safety-Critical Systems. In2025 IEEE Real-Time Systems Symposium (RTSS), pages 392–405, 2025

work page 2025

[27] [27]

C. L. Liu and James W. Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment.J. ACM, 20(1):46–61, January 1973

work page 1973

[28] [28]

Devi and James H

UmaMaheswari C. Devi and James H. Anderson. Tardiness Bounds under Global EDF Scheduling on a Multiprocessor. InProceedings of the 26th IEEE Interna- tional Real-Time Systems Symposium, RTSS ’05, page 330–341, USA, 2005. IEEE Computer Society

work page 2005

[29] [29]

BlueFace: Integrating an Accelerator into the Core’s Pipeline through Algorithm-Interface Co-Design for Real-Time SoCs

Zhe Jiang, Nathan Fisher, Nan Guan, and Zheng Dong. BlueFace: Integrating an Accelerator into the Core’s Pipeline through Algorithm-Interface Co-Design for Real-Time SoCs. In2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2023

work page 2023

[30] [30]

Schedulability Analysis for Coscheduling Real- Time Tasks on Multiprocessors.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(11):4721–4732, 2022

Zheng Dong and Cong Liu. Schedulability Analysis for Coscheduling Real- Time Tasks on Multiprocessors.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(11):4721–4732, 2022

work page 2022

[31] [31]

Qi, Hao Su, Kaichun Mo, and Leonidas J

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

work page 2017

[32] [32]

Point transformer v3: Simpler faster stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4840–4851, 2024

work page 2024

[33] [33]

MLP-Mixer: An all-MLP Archi- tecture for Vision

Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, and Alexey Dosovitskiy. MLP-Mixer: An all-MLP Archi- tecture for Vision. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural...

work page 2021

[34] [34]

ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):5314–5321, 2023

Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, and Hervé Jégou. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):5314–5321, 2023

work page 2023

[35] [35]

Training data-efficient image transformers & distillation through attention

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. Training data-efficient image transformers & distillation through attention. In Marina Meila and Tong Zhang, editors,Pro- ceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 103...

work page 2021