PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance
Pith reviewed 2026-05-10 19:40 UTC · model grok-4.3
The pith
PHAROS shows that adding soft real-time schedulability to accelerator design exploration finds workable hardware setups for more task sets than throughput-only methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through modeling, analysis, and evaluation, PHAROS demonstrates that its soft real-time schedulability-oriented design space exploration discovers more feasible configurations for a broader range of task sets than throughput-oriented DSE baselines while delivering improved real-time performance; the framework introduces preemption mechanisms and scheduler designs for spatially partitioned heterogeneous accelerators under FIFO and EDF policies and provides response-time analyses for those algorithms.
What carries the argument
Soft real-time (SRT) schedulability-oriented design space exploration (DSE) that tailors objectives and constraints to schedulability, paired with preemption mechanisms for spatially partitioned heterogeneous accelerators under FIFO and EDF scheduling.
If this is right
- A wider variety of task sets from safety-critical applications can be scheduled without missing deadlines.
- Hardware configurations can be selected to provide stronger guarantees on execution predictability.
- Designers can optimize accelerator systems specifically for real-time constraints instead of average-case throughput.
- Response-time bounds become available for FIFO and EDF schedulers running on these accelerators.
Where Pith is reading between the lines
- The same schedulability-driven exploration approach could be extended to hard real-time constraints by strengthening the underlying analyses.
- Future accelerator hardware designs might incorporate the preemption features as standard primitives once overheads are quantified.
- Similar real-time-aware design space exploration may prove useful for other embedded platforms such as GPUs or reconfigurable fabrics.
Load-bearing premise
The introduced preemption mechanisms and soft real-time schedulability analysis for spatially partitioned heterogeneous accelerators can be realized in hardware without significant unmodeled overheads or interference that would invalidate the deadline guarantees.
What would settle it
A hardware implementation of the preemption mechanisms on a spatially partitioned heterogeneous accelerator, with measurements showing whether task sets predicted as schedulable by the analysis actually meet all deadlines under realistic interference.
Figures
read the original abstract
Spatially partitioned heterogeneous accelerators (HAs) are increasingly adopted in embedded systems for their performance and flexibility. Yet most existing HA design frameworks optimize primarily for throughput or quality-of-service (QoS) metrics. They often overlook safety-critical real-time requirements, including hardware support for predictable execution, real-time-aware design space exploration (DSE), and rigorous schedulability analysis. These requirements are essential in safety-critical applications such as smart transportation, where schedulability guarantees directly affect system safety. To address this gap, we present PHAROS, a real-time-centric HA design framework. PHAROS introduces preemption mechanisms and scheduler designs for spatially partitioned HAs under first-in-first-out (FIFO) and earliest-deadline-first (EDF) policies. Leveraging modern real-time theory, we further develop a soft real-time (SRT) schedulability-oriented DSE with objectives and constraints tailored to SRT schedulability. Through comprehensive modeling, analysis, and evaluation across diverse applications, we show that PHAROS's DSE discovers more feasible configurations for a broader range of task sets than throughput-oriented DSE baselines while delivering improved real-time performance. We also provide response-time analyses for the supported scheduling algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents PHAROS, a real-time-centric framework for designing spatially partitioned heterogeneous accelerators (HAs). It introduces hardware preemption mechanisms and scheduler designs supporting FIFO and EDF policies, develops an SRT schedulability-oriented design space exploration (DSE) with tailored objectives and constraints, and supplies response-time analyses for the supported algorithms. The central claim is that PHAROS's DSE identifies more feasible task-set configurations across a broader range of applications than throughput-oriented DSE baselines while delivering improved real-time performance and deadline compliance.
Significance. If the response-time analyses are sound and the DSE evaluations demonstrate the claimed advantages without unmodeled overheads, the work would be significant for safety-critical embedded systems. It bridges HA design with modern real-time theory to enable predictable execution on flexible accelerators, directly addressing gaps in current frameworks that prioritize throughput over schedulability guarantees.
major comments (1)
- [Response-time analyses] Response-time analyses section: the SRT analyses for FIFO/EDF preemption on spatially partitioned HAs must explicitly bound or prove negligible all hardware-specific overheads (context save/restore, partial reconfiguration latency, pipeline state, and cross-partition memory contention); without this, the computed response times are optimistic and the DSE feasibility results cannot guarantee deadline compliance in silicon.
minor comments (2)
- [Abstract] Abstract: the claim of 'comprehensive modeling, analysis, and evaluation across diverse applications' is stated without any quantitative summary (e.g., number of task sets, improvement percentages, or feasibility ratios); a concise results highlight would improve clarity.
- [Evaluation] Evaluation: ensure all reported comparisons include the specific task-set parameters, number of runs, and any statistical measures so that the superiority over throughput baselines can be independently assessed.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation for major revision. The feedback highlights an important aspect of ensuring the soundness of our response-time analyses for safety-critical use. We address the major comment below and commit to the necessary revisions.
read point-by-point responses
-
Referee: [Response-time analyses] Response-time analyses section: the SRT analyses for FIFO/EDF preemption on spatially partitioned HAs must explicitly bound or prove negligible all hardware-specific overheads (context save/restore, partial reconfiguration latency, pipeline state, and cross-partition memory contention); without this, the computed response times are optimistic and the DSE feasibility results cannot guarantee deadline compliance in silicon.
Authors: We agree that explicit bounds on hardware-specific overheads are required for the analyses to be non-optimistic and to support deadline-compliance guarantees. Our current response-time analyses for FIFO and EDF on spatially partitioned HAs model preemption costs and pipeline behavior at a high level, but we acknowledge that we have not derived or stated explicit upper bounds for all listed items (partial reconfiguration latency, pipeline state, and cross-partition memory contention). In the revised manuscript we will augment the Response-time analyses section with these bounds, using the hardware parameters already defined in the PHAROS model, or provide arguments for their negligibility under the stated assumptions. This change will directly strengthen the link between the DSE feasibility results and practical deadline compliance. revision: yes
Circularity Check
No significant circularity in PHAROS derivation chain
full rationale
The paper introduces hardware preemption mechanisms for spatially partitioned HAs and develops SRT schedulability-oriented DSE plus response-time analyses by extending cited modern real-time theory. No load-bearing step reduces by construction to a fitted input, self-definition, or self-citation chain; the modeling, constraints, and evaluation across applications remain independent of the target claims. This is the expected non-finding for a framework paper that adds HA-specific mechanisms on top of established schedulability results.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Spatially partitioned heterogeneous accelerators can support preemption mechanisms under FIFO and EDF scheduling policies.
- domain assumption Soft real-time schedulability metrics can serve as primary objectives and constraints for design space exploration of accelerators.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formulate an SRT-oriented DSE, selecting maximum utilization as the optimization objective... the system is SRT-schedulable if and only if the utilization of every accelerator does not exceed 1
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PHAROS introduces preemption mechanisms and scheduler designs for spatially partitioned HAs under first-in-first-out (FIFO) and earliest-deadline-first (EDF) policies
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Heterogeneous Dataflow Accelerators for Multi-DNN Workloads
Hyoukjun Kwon, Liangzhen Lai, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, and Vikas Chandra. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 71–83, 2021
work page 2021
-
[2]
Dally, Yatish Turakhia, and Song Han
William J. Dally, Yatish Turakhia, and Song Han. Domain-specific hardware accelerators.Commun. ACM, 63(7):48–57, June 2020
work page 2020
-
[3]
CHARM: Composing Heterogeneous AcceleR- ators for Matrix Multiply on Versal ACAP Architecture
Jinming Zhuang, Jason Lau, Hanchen Ye, Zhuoping Yang, Yubo Du, Jack Lo, Kristof Denolf, Stephen Neuendorffer, Alex Jones, Jingtong Hu, Deming Chen, Jason Cong, and Peipei Zhou. CHARM: Composing Heterogeneous AcceleR- ators for Matrix Multiply on Versal ACAP Architecture. InProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gat...
work page 2023
-
[4]
PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units
Yujeong Choi and Minsoo Rhu. PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units. In2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 220–233, 2020
work page 2020
-
[5]
Zheng Dong, Cong Liu, Alan Gatherer, Lee McFearin, Peter Yan, and James H. Anderson. Optimal Dataflow Scheduling on a Heterogeneous Multiprocessor With Reduced Response Time Bounds. In Marko Bertogna, editor,29th Euromicro Conference on Real-Time Systems, ECRTS 2017, Dubrovnik, Croatia, June 27-30, 2017, LIPIcs, pages 15:1–15:22. Schloss Dagstuhl - Leibni...
work page 2017
-
[6]
Jones, Jingtong Hu, Yiyu Shi, and Peipei Zhou
Jinming Zhuang, Zhuoping Yang, Shixin Ji, Heng Huang, Alex K. Jones, Jingtong Hu, Yiyu Shi, and Peipei Zhou. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration. InProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’24, page 55–66, New York, NY, USA, 2024. A...
work page 2024
-
[7]
Stream-HLS: Towards Automatic Dataflow Acceleration
Suhail Basalama and Jason Cong. Stream-HLS: Towards Automatic Dataflow Acceleration. InProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 103–114, 2025
work page 2025
-
[8]
Carol Jingyi Li, Xiangwei Li, Binglei Lou, Craig T Jin, David Boland, and Philip HW Leong. Fixed-point FPGA Implementation of the FFT Accumulation Method for Real-time Cyclostationary Analysis.ACM Transactions on Reconfig- urable Technology and Systems, 16(3):1–28, 2023
work page 2023
-
[9]
Jones, Yiyu Shi, Yanzhi Wang, and Peipei Zhou
Peiyan Dong, Jinming Zhuang, Zhuoping Yang, Shixin Ji, Yanyu Li, Dongkuan Xu, Heng Huang, Jingtong Hu, Alex K. Jones, Yiyu Shi, Yanzhi Wang, and Peipei Zhou. EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real- Time Vision Transformer Inference on Versal ACAP Architecture.IEEE Transac- tions on Computer-Aided Design of Integrated Circ...
work page 2024
-
[10]
Zhuoping Yang, Jinming Zhuang, Jiaqi Yin, Cunxi Yu, Alex K Jones, and Peipei Zhou. AIM: Accelerating Arbitrary-precision Integer Multiplication on Hetero- geneous Reconfigurable Computing Platform Versal ACAP. In2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 1–9. IEEE, 2023
work page 2023
-
[11]
ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
Jinming Zhuang, Shaojie Xiang, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, and Peipei Zhou. ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines. InProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’25, 2024
work page 2025
-
[12]
DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator
Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator. InProceedings of the 39th International Conference on Computer-Aided Design, ICCAD ’20, New York, NY, USA, 2020. Association for Computing Machinery
work page 2020
-
[13]
Inter- layer Scheduling Space Definition and Exploration for Tiled Accelerators
Jingwei Cai, Yuchen Wei, Zuotong Wu, Sen Peng, and Kaisheng Ma. Inter- layer Scheduling Space Definition and Exploration for Tiled Accelerators. In Proceedings of the 50th Annual International Symposium on Computer Architecture, ISCA ’23, New York, NY, USA, 2023. Association for Computing Machinery
work page 2023
-
[14]
DREAM: A Dynamic Scheduler for Dynamic Real- time Multi-model ML Workloads
Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen, Liangzhen Lai, and Vikas Chandra. DREAM: A Dynamic Scheduler for Dynamic Real- time Multi-model ML Workloads. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, pages 73–86, 2023
work page 2023
-
[15]
Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmen- dra Reddy Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, and Hadi Esmaeilzadeh. Planaria: Dy- namic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks. In2020 53rd Annual IEEE/ACM International Symposium...
work page 2020
-
[16]
CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators
Chunyang Wang, Yuebin Bai, and Desen Sun. CD-MSA: Cooperative and Deadline-Aware Scheduling for Efficient Multi-Tenancy on DNN Accelerators. IEEE Transactions on Parallel and Distributed Systems, 34(7):2091–2106, 2023
work page 2091
-
[17]
Layer-Puzzle: Allocating and Scheduling Multi-task on Multi-core NPUs by Using Layer Heterogeneity
Chengsi Gao, Ying Wang, Cheng Liu, Mengdi Wang, Weiwei Chen, Yinhe Han, and Lei Zhang. Layer-Puzzle: Allocating and Scheduling Multi-task on Multi-core NPUs by Using Layer Heterogeneity. In2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–6. IEEE, 2023
work page 2023
-
[18]
Young H. Oh, Seonghak Kim, Yunho Jin, Sam Son, Jonghyun Bae, Jongsung Lee, Yeonhong Park, Dong Uk Kim, Tae Jun Ham, and Jae W. Lee. Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 584–597, 2021
work page 2021
-
[19]
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanović, Borivoje Nikolić, and Yakun Sophia Shao. MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks. In2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 828–841, 2023
work page 2023
-
[20]
Shulin Zeng, Guohao Dai, Niansong Zhang, Xinhao Yang, Haoyu Zhang, Zhen- hua Zhu, Huazhong Yang, and Yu Wang. Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.IEEE Transactions on Computers, 72(5):1314–1328, 2023
work page 2023
-
[21]
Time-Predictable Acceleration of Deep Neural Networks on FPGA SoC Platforms
Francesco Restuccia and Alessandro Biondi. Time-Predictable Acceleration of Deep Neural Networks on FPGA SoC Platforms. In2021 IEEE Real-Time Systems Symposium (RTSS), pages 441–454, 2021
work page 2021
- [22]
-
[23]
MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs
Jiapeng Guan, Ran Wei, Dean You, Yingquan Wang, Ruizhe Yang, Hui Wang, and Zhe Jiang. MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs. In2024 IEEE Real-Time Systems Symposium (RTSS), pages 1–14. IEEE, 2024
work page 2024
-
[24]
Jones, Zheng Dong, and Peipei Zhou
Shixin Ji, Xingzhen Chen, Wei Zhang, Zhuoping Yang, Jinming Zhuang, Sarah Schultz, Yukai Song, Jingtong Hu, Alex K. Jones, Zheng Dong, and Peipei Zhou. Towards Accelerator Customization in Real-time Safety-critical Systems. InPro- ceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’25, page 181, New York, NY, US...
work page 2025
-
[25]
ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems
Shixin Ji, Xingzhen Chen, Jinming Zhuang, Wei Zhang, Zhuoping Yang, Sarah Schultz, Yukai Song, Jingtong Hu, Alex Jones, Zheng Dong, and Peipei Zhou. ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems. InProceedings of the Great Lakes Symposium on VLSI 2025, GLSVLSI ’25, page 442–449, New York, NY, USA, 2025. Association for Co...
work page 2025
-
[26]
Jones, Zheng Dong, and Peipei Zhou
Shixin Ji, Zhuoping Yang, Xingzhen Chen, Wei Zhang, Jinming Zhuang, Alex K. Jones, Zheng Dong, and Peipei Zhou. DERCA: DetERministic Cycle-Level Accel- erator on Reconfigurable Platforms in DNN-Enabled Real-Time Safety-Critical Systems. In2025 IEEE Real-Time Systems Symposium (RTSS), pages 392–405, 2025
work page 2025
-
[27]
C. L. Liu and James W. Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment.J. ACM, 20(1):46–61, January 1973
work page 1973
-
[28]
UmaMaheswari C. Devi and James H. Anderson. Tardiness Bounds under Global EDF Scheduling on a Multiprocessor. InProceedings of the 26th IEEE Interna- tional Real-Time Systems Symposium, RTSS ’05, page 330–341, USA, 2005. IEEE Computer Society
work page 2005
-
[29]
Zhe Jiang, Nathan Fisher, Nan Guan, and Zheng Dong. BlueFace: Integrating an Accelerator into the Core’s Pipeline through Algorithm-Interface Co-Design for Real-Time SoCs. In2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2023
work page 2023
-
[30]
Zheng Dong and Cong Liu. Schedulability Analysis for Coscheduling Real- Time Tasks on Multiprocessors.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(11):4721–4732, 2022
work page 2022
-
[31]
Qi, Hao Su, Kaichun Mo, and Leonidas J
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
work page 2017
-
[32]
Point transformer v3: Simpler faster stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4840–4851, 2024
work page 2024
-
[33]
MLP-Mixer: An all-MLP Archi- tecture for Vision
Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, and Alexey Dosovitskiy. MLP-Mixer: An all-MLP Archi- tecture for Vision. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural...
work page 2021
-
[34]
Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, and Hervé Jégou. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):5314–5321, 2023
work page 2023
-
[35]
Training data-efficient image transformers & distillation through attention
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. Training data-efficient image transformers & distillation through attention. In Marina Meila and Tong Zhang, editors,Pro- ceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 103...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.