Stannic: Systolic STochAstic ONliNe SchedulIng AcCelerator

Adam H. Ross; Debjit Pal; Vairavan Palaniappan

arxiv: 2507.01113 · v3 · submitted 2025-07-01 · 💻 cs.DC · cs.SY· eess.SY

Stannic: Systolic STochAstic ONliNe SchedulIng AcCelerator

Adam H. Ross , Vairavan Palaniappan , Debjit Pal This is my paper

Pith reviewed 2026-05-19 06:13 UTC · model grok-4.3

classification 💻 cs.DC cs.SYeess.SY

keywords stochastic schedulingFPGA acceleratorheterogeneous computingonline schedulingHPCsystolic arrayworkload balancing

0 comments

The pith

A systolic FPGA accelerator produces heterogeneity-aware schedules for stochastic workloads in near real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Stannic as a systolic microarchitecture that accelerates a non-preemptive stochastic online scheduling algorithm for heterogeneous systems. It shifts from a task-centric approach in an earlier design to a schedule-centric abstraction that exploits parallelism, pre-calculation, and spatial memory access. This produces schedules that balance unpredictable job arrivals and processing times across machines of varying capabilities. A reader would care because software schedulers in shared HPC clusters often cannot adapt fast enough to stochastic conditions without excessive overhead.

Core claim

The paper claims that Stannic, by inheriting a schedule-centric abstraction on FPGA hardware, reduces latency per computation iteration by 7.5 times and increases the supported size of the target heterogeneous system by 14 times compared with prior hardware acceleration, while still generating schedules that achieve efficient machine utilization and low average job latency under stochastic conditions.

What carries the argument

Stannic systolic accelerator that uses a schedule-centric abstraction to parallelize the computation of heterogeneity-aware schedules.

Load-bearing premise

The hardware correctly executes the full stochastic scheduling logic and produces exactly the same schedule decisions as a correct software implementation.

What would settle it

A side-by-side run on identical stochastic job traces where the hardware-generated schedules show higher average job latency or lower overall machine utilization than the original software algorithm.

Figures

Figures reproduced from arXiv: 2507.01113 by Adam H. Ross, Debjit Pal, Vairavan Palaniappan.

**Figure 1.** Figure 1: Algorithmic flow for stochastic online scheduling. Phase I prepares a job for the scheduler, Phase II and Phase III show the steps involved in scheduling the job. microarchitecture (µarchitecture) of the accelerator. Sections V and VI discuss the experimental setup and results respectively, done for testing the scheduler. Section VII surveys the related work followed by conclusion in Section VIII. II. PREL… view at source ↗

**Figure 2.** Figure 2: Top-level block diagram of the HERCULES scheduler. Phase II and III are the phases shown in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: (a): Cost Calculator. TAH: Tree Adder to compute costH. TAL: Tree Adder to compute costL. N: # of jobs in each machine. In: {K.ID, sumH, sumL, T K i } ×N. Out: {sumH, sumL} ×N. (b): Individual Job Cost Calculator. (c): αJ check module. CAM: Content Addressable Memory. K.ID is used as the tag for content matching and data retrieval. Data Selector (DS) Reg N Reg 2 Reg 1 Reg 0 0 popIDnew RD LD NewD RD LD NewD… view at source ↗

**Figure 6.** Figure 6: (a): Various quantization techniques applied to each job attribute. Green highlights the most suitable quantization. (b): Scheduled job distribution in each machine. (c): % Error in αJ . (d): % Error in WSPT. WSPT and αJ , respectively. INT8 exhibits the second-highest WSPT error, while INT4 and Mixed-precision approaches show lower WSPT errors. However, INT8 demonstrates lower αJ error than INT4 and Mixed… view at source ↗

**Figure 7.** Figure 7: (a): Average machine utilization across emulations. Darker the color, more the number of jobs assigned to the machine. (b): Scheduler throughput across emulations. Hardware for SOS scheduler: We have used an AMD Alveo U55C [15] as our target FPGA to implement the SOS scheduler. We used Allo/HeteroCL [16], [17] programming language to design the scheduler. The operating frequency of the scheduler is 371.47 … view at source ↗

**Figure 10.** Figure 10: Job distribution and average latency across M1 – M5 under varied workloads. SOS: Stochastic Online Scheduler; RR: Round Robin Scheduler; WSRR: Work Stealing Round Robin Scheduler; WSG: Work Stealing Greedy Scheduler. These experiments show that SOSA is an efficient, effective, and adaptable scheduler under varying realistic workloads targeting heterogeneous and homogeneous hardware. VII. RELATED WORK Seve… view at source ↗

read the original abstract

Efficient workload scheduling is a critical challenge in modern heterogeneous computing environments, particularly in high-performance computing (HPC) systems. Traditional software-based schedulers struggle to efficiently balance workloads due to scheduling overhead, lack of adaptability to stochastic workloads, and suboptimal resource utilization. The scheduling problem further compounds in the context of shared HPC clusters, where job arrivals and processing times are inherently stochastic. Prediction of these elements is possible, but it introduces additional overhead. To perform this complex scheduling, we developed two FPGA-assisted hardware accelerator microarchitectures, Hercules and Stannic. Hercules adopts a task-centric abstraction of stochastic scheduling, whereas Stannic inherits a schedule-centric abstraction. These hardware-assisted solutions leverage parallelism, pre-calculation, and spatial memory access to significantly accelerate scheduling. We accelerate a non-preemptive stochastic online scheduling algorithm to produce heterogeneity-aware schedules in near real time. With Hercules, we achieved a speedup of up to 1060x over a baseline C/C++ implementation, demonstrating the efficacy of a hardware-assisted acceleration for heterogeneity-aware stochastic scheduling. With Stannic, we further improved efficiency, achieving a 7.5x reduction in latency per computation iteration and a 14x increase in the target heterogeneous system size. Experimental results show that the resulting schedules demonstrate efficient machine utilization and low average job latency in stochastic contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The FPGA accelerators for stochastic scheduling deliver large claimed speedups but rest on thin evidence that hardware matches software schedule quality.

read the letter

The paper's main contribution is two FPGA microarchitectures, Hercules and Stannic, for accelerating non-preemptive stochastic online scheduling in heterogeneous HPC systems. Hercules uses a task-centric abstraction and reports up to 1060x speedup over a C/C++ baseline. Stannic moves to a schedule-centric design, cutting latency per iteration by 7.5x and supporting 14x larger target systems while still producing heterogeneity-aware schedules in near real time.

Referee Report

2 major / 0 minor

Summary. The manuscript presents two FPGA-based hardware accelerators, Hercules and Stannic, for non-preemptive stochastic online scheduling in heterogeneous HPC systems. Hercules uses a task-centric abstraction while Stannic adopts a schedule-centric abstraction with systolic parallelism and pre-calculation. The authors report a speedup of up to 1060x over a C/C++ baseline with Hercules, and with Stannic a 7.5x reduction in latency per iteration plus a 14x increase in supported system size, while claiming efficient machine utilization and low average job latency for stochastic workloads.

Significance. If the hardware faithfully reproduces the software scheduler's decisions, the work could enable near-real-time heterogeneity-aware scheduling for large stochastic workloads in shared HPC clusters, where software overhead currently limits adaptability. The shift to schedule-centric systolic design and the reported scaling improvements would be a notable engineering contribution for hardware-accelerated resource management.

major comments (2)

[Abstract] Abstract: the performance claims (1060x speedup, 7.5x latency reduction, 14x system-size increase) are stated without any description of experimental methodology, workload generation model, number of trials, baseline C/C++ implementation details, or hardware resource counts, leaving the central speedup results unsupported.
[Evaluation] Evaluation section: no quantitative equivalence data (machine utilization, average job latency, or schedule-quality metrics) is provided comparing hardware outputs to the software reference implementation. Without this, it is impossible to confirm that fixed-point arithmetic, pseudo-random generation, or spatial memory access in the systolic design preserve the stochastic decision distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and agree that greater detail on methodology and validation will strengthen the work. Revisions have been prepared to incorporate these suggestions.

read point-by-point responses

Referee: [Abstract] Abstract: the performance claims (1060x speedup, 7.5x latency reduction, 14x system-size increase) are stated without any description of experimental methodology, workload generation model, number of trials, baseline C/C++ implementation details, or hardware resource counts, leaving the central speedup results unsupported.

Authors: We agree that the abstract would benefit from additional context to support the performance claims. In the revised version we will add a concise description of the workload generation model, number of trials performed, key aspects of the C/C++ baseline implementation, and hardware resource counts. Full experimental details remain in the Evaluation section, but the abstract update will make the central results more self-contained. revision: yes
Referee: [Evaluation] Evaluation section: no quantitative equivalence data (machine utilization, average job latency, or schedule-quality metrics) is provided comparing hardware outputs to the software reference implementation. Without this, it is impossible to confirm that fixed-point arithmetic, pseudo-random generation, or spatial memory access in the systolic design preserve the stochastic decision distribution.

Authors: The Evaluation section reports machine utilization and average job latency results for the hardware accelerators on stochastic workloads. We acknowledge, however, that direct quantitative equivalence metrics comparing hardware schedule quality and decision distributions to the software reference are not explicitly tabulated. In the revision we will add these comparisons, including schedule-quality scores, statistical similarity measures between hardware and software decisions, and targeted checks on the fixed-point and pseudo-random components to confirm preservation of the stochastic behavior. revision: yes

Circularity Check

0 steps flagged

No circularity: speedups derived from direct hardware-vs-software timing measurements

full rationale

The paper's core claims are empirical speedups (1060x for Hercules, 7.5x latency reduction and 14x scale increase for Stannic) obtained by comparing FPGA hardware execution time against a baseline C/C++ software implementation of the same non-preemptive stochastic online scheduler. No equations, fitted parameters, or self-citations are used to derive the reported performance numbers; the results follow from direct benchmarking of the implemented microarchitectures. The schedule-quality statements are presented as experimental observations rather than predictions forced by construction. The derivation chain is therefore self-contained against external timing measurements and does not reduce to any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The design rests on standard domain assumptions about stochastic workloads and FPGA capabilities with no free parameters, invented entities, or ad-hoc axioms visible in the abstract.

axioms (1)

domain assumption Job arrivals and processing times in shared HPC clusters are inherently stochastic.
Invoked in the abstract as the core challenge that software schedulers struggle with.

pith-pipeline@v0.9.0 · 5787 in / 1308 out tokens · 90507 ms · 2026-05-19T06:13:21.425971+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

Exploration on Task Scheduling Strategy for CPU-GPU Heterogeneous Computing System

Juan Fang, Jiaxing Zhang, Shuaibing Lu, and Hui Zhao. Exploration on Task Scheduling Strategy for CPU-GPU Heterogeneous Computing System. IEEE Computer Society Annual Symp. on VLSI (ISVLSI) , 2020

work page 2020
[2]

Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data- Parallel Kernels on Heterogeneous Systems

Lanjun Wan, Weihua Zheng, and Xinpan Yuan. Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data- Parallel Kernels on Heterogeneous Systems. IEEE Access, 2021

work page 2021
[3]

Improved Task Scheduling in Heterogeneous Distributed Systems using Intelligent Greedy Harris Hawk Optimization Algorithm

Mohammad Navid Habibpour Roudsari. Improved Task Scheduling in Heterogeneous Distributed Systems using Intelligent Greedy Harris Hawk Optimization Algorithm. Evol. Intel. (EI) , 2024

work page 2024
[4]

Uncertainty-Aware Online Deadline-Constrained Scheduling of Parallel Applications in Distributed Heterogeneous Systems

Yifan Liu, Jinchao Chen, Jiangong Yang, Chenglie Du, and Xiaoyan Du. Uncertainty-Aware Online Deadline-Constrained Scheduling of Parallel Applications in Distributed Heterogeneous Systems. Computers & Industrial Engineering , 2024

work page 2024
[5]

Feature- Aware Task Scheduling on CPU-FPGA Heterogeneous Platforms

Peilun Du, Zichang Sun, Haitao Zhang, and Huadong Ma. Feature- Aware Task Scheduling on CPU-FPGA Heterogeneous Platforms. Int’l Conf. on High Performance Computing and Communications(HPCC) , 2019

work page 2019
[6]

Scheduling for Heterogeneous Systems in Accelerator-Rich Environments

Serif Yesil and Ozcan Ozturk. Scheduling for Heterogeneous Systems in Accelerator-Rich Environments. The Journal of Supercomputing (JSC) , 2022

work page 2022
[7]

Reliability-Aware Scheduling on Heterogeneous Multicore Processors

Ajeya Naithani, Stijn Eyerman, and Lieven Eeckhout. Reliability-Aware Scheduling on Heterogeneous Multicore Processors. Int’l Symp. on High-Performance Computer Architecture (HPCA) , 2017

work page 2017
[8]

Runtime and Energy Constrained Work Scheduling for Hetero- geneous Systems

Valon Raca, Seeun William Umboh, Eduard Mehofer, and Bernhard Scholz. Runtime and Energy Constrained Work Scheduling for Hetero- geneous Systems. The Journal of Supercomputing (JSC) , 2022

work page 2022
[9]

Optimal Task Scheduling for Partially Heterogeneous Systems

Michael Orr and Oliver Sinnen. Optimal Task Scheduling for Partially Heterogeneous Systems. Parallel Computing, 2021

work page 2021
[10]

An Improved Greedy Algorithm for Stochastic Online Scheduling on Unrelated Machines

Sven J ¨ager. An Improved Greedy Algorithm for Stochastic Online Scheduling on Unrelated Machines. Discrete Optimization (DO) , 2023

work page 2023
[11]

popcount

CPP Reference. popcount. https://en.cppreference.com/w/cpp/numeric/ popcount, 2024. Accessed: September 7, 2025

work page 2024
[12]

Galvin, and Greg Gagne

Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. Operating System Concepts. John Wiley & Sons, 9 edition, 2012

work page 2012
[13]

Greedy Scheduling of Tasks With Time Constraints for Energy-Efficient Cloud-Computing Data Centers

Ziqian Dong, Ning Liu, and Roberto Rojas-Cessa. Greedy Scheduling of Tasks With Time Constraints for Energy-Efficient Cloud-Computing Data Centers. Journal of Cloud Computing , 2015

work page 2015
[14]

Task- flow: A General-Purpose Parallel and Heterogeneous Task Programming System

Tsung-Wei Huang, Dian-Lun Lin, Yibo Lin, and Chun-Xun Lin. Task- flow: A General-Purpose Parallel and Heterogeneous Task Programming System. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD) , 2022

work page 2022
[15]

AMD Alveo U55C Product brief

AMD. AMD Alveo U55C Product brief. https://www.amd.com/en/ products/accelerators/alveo/u55c.html, 2024. Accessed: September 7, 2025

work page 2024
[16]

Allo: A Programming Model for Composable Accelerator Design

Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, and Zhiru Zhang. Allo: A Programming Model for Composable Accelerator Design. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI) , 2024

work page 2024
[17]

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Com- puting

Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Com- puting. Int’l Symp. on Field-Programmable Gate Arrays (FPGA), 2019

work page 2019
[18]

A Configurable Hardware Scheduler for Real-Time Systems

Pramote Kuacharoen, Mohamed Shalan, and Vincent John Mooney. A Configurable Hardware Scheduler for Real-Time Systems. Engineering of Reconfigurable Systems and Algorithms , 2003

work page 2003
[19]

Bergmann

Yi Tang and Neil W. Bergmann. A Hardware Scheduler Based on Task Queues for FPGA-Based Embedded Real-Time Systems. IEEE Trans. on Computers (TC) , 2015

work page 2015
[20]

HRHS: A High-Performance Real-Time Hardware Scheduler

Danesh Derafshi, Amin Norollah, Mohsen Khosroanjam, and Hakem Beitollahi. HRHS: A High-Performance Real-Time Hardware Scheduler. IEEE Trans. on Parallel and Distributed Systems (TPDS) , 2020

work page 2020
[21]

Efficient Scheduling of Dependent Tasks in Many-Core Real-Time System Using a Hardware Scheduler

Amin Norollah, Zahra Kazemi, Niloufar Sayadi, Hakem Beitollahi, Mahdi Fazeli, and David Hely. Efficient Scheduling of Dependent Tasks in Many-Core Real-Time System Using a Hardware Scheduler. Workshop on High-Performance Embedded Computing , 2021

work page 2021
[22]

HD-CPS: Hardware-Assisted Drift-Aware Concurrent Priority Scheduler for Shared Memory Multicores

Mohsin Shan and Omer Khan. HD-CPS: Hardware-Assisted Drift-Aware Concurrent Priority Scheduler for Shared Memory Multicores. Int’l Symp. on High-Performance Computer Architecture (HPCA) , 2022

work page 2022
[23]

SchedTask: A Hardware- Assisted Task Scheduler

Prathmesh Kallurkar and Smruti R Sarangi. SchedTask: A Hardware- Assisted Task Scheduler. Int’l Symp. on Microarchitecture (MICRO) , 2017

work page 2017
[24]

Task- flow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin. Task- flow: A Lightweight Parallel and Heterogeneous Task Graph Computing System. IEEE Trans. on Parallel and Distributed Systems (TPDS), 2022

work page 2022
[25]

Models and Algorithms for Stochastic Online Scheduling

Nicole Megow, Marc Uetz, and Tjark Vredeveld. Models and Algorithms for Stochastic Online Scheduling. Mathematics of Operations Research (MOR), 2006

work page 2006
[26]

AMD Vitis User Guide

AMD. AMD Vitis User Guide. https://docs.amd.com/r/en-US/Vitis Libraries/User-Guide, 2024. Accessed: September 7, 2025

work page 2024
[27]

Xilinx XRT Documentation

Xilinx. Xilinx XRT Documentation. https://xilinx.github.io/XRT/2024. 1/html/index.html, 2024. Accessed: September 7, 2025

work page 2024
[28]

Vitis HLS User Guide

AMD. Vitis HLS User Guide. https://docs.amd.com/r/en-US/ ug1399-vitis-hls, 2024. Accessed: September 7, 2025

work page 2024
[29]

Optimal Task Scheduling Benefits from A Duplicate-Free State-Space

Michael Orr and Oliver Sinnen. Optimal Task Scheduling Benefits from A Duplicate-Free State-Space. Journal of Parallel and Distributed Computing, 2020

work page 2020
[30]

Task Scheduling Frameworks for Heterogeneous Computing Toward Exascale

Suhelah Sandokji and Fathy Eassa. Task Scheduling Frameworks for Heterogeneous Computing Toward Exascale. Int’l Journal of Advanced Computer Science and Applications(IJACSA) , 2018

work page 2018
[31]

Design and Analysis of Scheduling Strategies for Multi-CPU and Multi-GPU Architectures

Joao VF Lima, Thierry Gautier, Vincent Danjean, Bruno Raffin, and Nicolas Maillard. Design and Analysis of Scheduling Strategies for Multi-CPU and Multi-GPU Architectures. Parallel Computing, 2015

work page 2015
[32]

Real- Time Scheduling of Parallel Tasks with Tight Deadlines

Xu Jiang, Nan Guan, Xiang Long, Yue Tang, and Qingqiang He. Real- Time Scheduling of Parallel Tasks with Tight Deadlines. Journal of Systems Architecture, 2020

work page 2020
[33]

Energy-Efficient Stochastic Task Scheduling on Heterogeneous Computing Systems

Kenli Li, Xiaoyong Tang, and Keqin Li. Energy-Efficient Stochastic Task Scheduling on Heterogeneous Computing Systems. IEEE Trans. on Parallel and Distributed Systems (TPDS) , 2013

work page 2013
[34]

Efficient Program Scheduling for Hetero- geneous Multi-Core Processors

Jian Chen and Lizy K John. Efficient Program Scheduling for Hetero- geneous Multi-Core Processors. Design Automation Conf. (DAC), 2009

work page 2009

[1] [1]

Exploration on Task Scheduling Strategy for CPU-GPU Heterogeneous Computing System

Juan Fang, Jiaxing Zhang, Shuaibing Lu, and Hui Zhao. Exploration on Task Scheduling Strategy for CPU-GPU Heterogeneous Computing System. IEEE Computer Society Annual Symp. on VLSI (ISVLSI) , 2020

work page 2020

[2] [2]

Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data- Parallel Kernels on Heterogeneous Systems

Lanjun Wan, Weihua Zheng, and Xinpan Yuan. Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data- Parallel Kernels on Heterogeneous Systems. IEEE Access, 2021

work page 2021

[3] [3]

Improved Task Scheduling in Heterogeneous Distributed Systems using Intelligent Greedy Harris Hawk Optimization Algorithm

Mohammad Navid Habibpour Roudsari. Improved Task Scheduling in Heterogeneous Distributed Systems using Intelligent Greedy Harris Hawk Optimization Algorithm. Evol. Intel. (EI) , 2024

work page 2024

[4] [4]

Uncertainty-Aware Online Deadline-Constrained Scheduling of Parallel Applications in Distributed Heterogeneous Systems

Yifan Liu, Jinchao Chen, Jiangong Yang, Chenglie Du, and Xiaoyan Du. Uncertainty-Aware Online Deadline-Constrained Scheduling of Parallel Applications in Distributed Heterogeneous Systems. Computers & Industrial Engineering , 2024

work page 2024

[5] [5]

Feature- Aware Task Scheduling on CPU-FPGA Heterogeneous Platforms

Peilun Du, Zichang Sun, Haitao Zhang, and Huadong Ma. Feature- Aware Task Scheduling on CPU-FPGA Heterogeneous Platforms. Int’l Conf. on High Performance Computing and Communications(HPCC) , 2019

work page 2019

[6] [6]

Scheduling for Heterogeneous Systems in Accelerator-Rich Environments

Serif Yesil and Ozcan Ozturk. Scheduling for Heterogeneous Systems in Accelerator-Rich Environments. The Journal of Supercomputing (JSC) , 2022

work page 2022

[7] [7]

Reliability-Aware Scheduling on Heterogeneous Multicore Processors

Ajeya Naithani, Stijn Eyerman, and Lieven Eeckhout. Reliability-Aware Scheduling on Heterogeneous Multicore Processors. Int’l Symp. on High-Performance Computer Architecture (HPCA) , 2017

work page 2017

[8] [8]

Runtime and Energy Constrained Work Scheduling for Hetero- geneous Systems

Valon Raca, Seeun William Umboh, Eduard Mehofer, and Bernhard Scholz. Runtime and Energy Constrained Work Scheduling for Hetero- geneous Systems. The Journal of Supercomputing (JSC) , 2022

work page 2022

[9] [9]

Optimal Task Scheduling for Partially Heterogeneous Systems

Michael Orr and Oliver Sinnen. Optimal Task Scheduling for Partially Heterogeneous Systems. Parallel Computing, 2021

work page 2021

[10] [10]

An Improved Greedy Algorithm for Stochastic Online Scheduling on Unrelated Machines

Sven J ¨ager. An Improved Greedy Algorithm for Stochastic Online Scheduling on Unrelated Machines. Discrete Optimization (DO) , 2023

work page 2023

[11] [11]

popcount

CPP Reference. popcount. https://en.cppreference.com/w/cpp/numeric/ popcount, 2024. Accessed: September 7, 2025

work page 2024

[12] [12]

Galvin, and Greg Gagne

Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. Operating System Concepts. John Wiley & Sons, 9 edition, 2012

work page 2012

[13] [13]

Greedy Scheduling of Tasks With Time Constraints for Energy-Efficient Cloud-Computing Data Centers

Ziqian Dong, Ning Liu, and Roberto Rojas-Cessa. Greedy Scheduling of Tasks With Time Constraints for Energy-Efficient Cloud-Computing Data Centers. Journal of Cloud Computing , 2015

work page 2015

[14] [14]

Task- flow: A General-Purpose Parallel and Heterogeneous Task Programming System

Tsung-Wei Huang, Dian-Lun Lin, Yibo Lin, and Chun-Xun Lin. Task- flow: A General-Purpose Parallel and Heterogeneous Task Programming System. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD) , 2022

work page 2022

[15] [15]

AMD Alveo U55C Product brief

AMD. AMD Alveo U55C Product brief. https://www.amd.com/en/ products/accelerators/alveo/u55c.html, 2024. Accessed: September 7, 2025

work page 2024

[16] [16]

Allo: A Programming Model for Composable Accelerator Design

Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, and Zhiru Zhang. Allo: A Programming Model for Composable Accelerator Design. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI) , 2024

work page 2024

[17] [17]

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Com- puting

Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Com- puting. Int’l Symp. on Field-Programmable Gate Arrays (FPGA), 2019

work page 2019

[18] [18]

A Configurable Hardware Scheduler for Real-Time Systems

Pramote Kuacharoen, Mohamed Shalan, and Vincent John Mooney. A Configurable Hardware Scheduler for Real-Time Systems. Engineering of Reconfigurable Systems and Algorithms , 2003

work page 2003

[19] [19]

Bergmann

Yi Tang and Neil W. Bergmann. A Hardware Scheduler Based on Task Queues for FPGA-Based Embedded Real-Time Systems. IEEE Trans. on Computers (TC) , 2015

work page 2015

[20] [20]

HRHS: A High-Performance Real-Time Hardware Scheduler

Danesh Derafshi, Amin Norollah, Mohsen Khosroanjam, and Hakem Beitollahi. HRHS: A High-Performance Real-Time Hardware Scheduler. IEEE Trans. on Parallel and Distributed Systems (TPDS) , 2020

work page 2020

[21] [21]

Efficient Scheduling of Dependent Tasks in Many-Core Real-Time System Using a Hardware Scheduler

Amin Norollah, Zahra Kazemi, Niloufar Sayadi, Hakem Beitollahi, Mahdi Fazeli, and David Hely. Efficient Scheduling of Dependent Tasks in Many-Core Real-Time System Using a Hardware Scheduler. Workshop on High-Performance Embedded Computing , 2021

work page 2021

[22] [22]

HD-CPS: Hardware-Assisted Drift-Aware Concurrent Priority Scheduler for Shared Memory Multicores

Mohsin Shan and Omer Khan. HD-CPS: Hardware-Assisted Drift-Aware Concurrent Priority Scheduler for Shared Memory Multicores. Int’l Symp. on High-Performance Computer Architecture (HPCA) , 2022

work page 2022

[23] [23]

SchedTask: A Hardware- Assisted Task Scheduler

Prathmesh Kallurkar and Smruti R Sarangi. SchedTask: A Hardware- Assisted Task Scheduler. Int’l Symp. on Microarchitecture (MICRO) , 2017

work page 2017

[24] [24]

Task- flow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin. Task- flow: A Lightweight Parallel and Heterogeneous Task Graph Computing System. IEEE Trans. on Parallel and Distributed Systems (TPDS), 2022

work page 2022

[25] [25]

Models and Algorithms for Stochastic Online Scheduling

Nicole Megow, Marc Uetz, and Tjark Vredeveld. Models and Algorithms for Stochastic Online Scheduling. Mathematics of Operations Research (MOR), 2006

work page 2006

[26] [26]

AMD Vitis User Guide

AMD. AMD Vitis User Guide. https://docs.amd.com/r/en-US/Vitis Libraries/User-Guide, 2024. Accessed: September 7, 2025

work page 2024

[27] [27]

Xilinx XRT Documentation

Xilinx. Xilinx XRT Documentation. https://xilinx.github.io/XRT/2024. 1/html/index.html, 2024. Accessed: September 7, 2025

work page 2024

[28] [28]

Vitis HLS User Guide

AMD. Vitis HLS User Guide. https://docs.amd.com/r/en-US/ ug1399-vitis-hls, 2024. Accessed: September 7, 2025

work page 2024

[29] [29]

Optimal Task Scheduling Benefits from A Duplicate-Free State-Space

Michael Orr and Oliver Sinnen. Optimal Task Scheduling Benefits from A Duplicate-Free State-Space. Journal of Parallel and Distributed Computing, 2020

work page 2020

[30] [30]

Task Scheduling Frameworks for Heterogeneous Computing Toward Exascale

Suhelah Sandokji and Fathy Eassa. Task Scheduling Frameworks for Heterogeneous Computing Toward Exascale. Int’l Journal of Advanced Computer Science and Applications(IJACSA) , 2018

work page 2018

[31] [31]

Design and Analysis of Scheduling Strategies for Multi-CPU and Multi-GPU Architectures

Joao VF Lima, Thierry Gautier, Vincent Danjean, Bruno Raffin, and Nicolas Maillard. Design and Analysis of Scheduling Strategies for Multi-CPU and Multi-GPU Architectures. Parallel Computing, 2015

work page 2015

[32] [32]

Real- Time Scheduling of Parallel Tasks with Tight Deadlines

Xu Jiang, Nan Guan, Xiang Long, Yue Tang, and Qingqiang He. Real- Time Scheduling of Parallel Tasks with Tight Deadlines. Journal of Systems Architecture, 2020

work page 2020

[33] [33]

Energy-Efficient Stochastic Task Scheduling on Heterogeneous Computing Systems

Kenli Li, Xiaoyong Tang, and Keqin Li. Energy-Efficient Stochastic Task Scheduling on Heterogeneous Computing Systems. IEEE Trans. on Parallel and Distributed Systems (TPDS) , 2013

work page 2013

[34] [34]

Efficient Program Scheduling for Hetero- geneous Multi-Core Processors

Jian Chen and Lizy K John. Efficient Program Scheduling for Hetero- geneous Multi-Core Processors. Design Automation Conf. (DAC), 2009

work page 2009