THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs
Pith reviewed 2026-05-24 02:43 UTC · model grok-4.3
The pith
THEMIS scheduling algorithm achieves 24.2 to 98.4 percent better fairness in multi-tenant FPGAs by fixing time, energy, and heterogeneity issues.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present THEMIS as a scheduling solution that ensures spatiotemporal fairness by considering both spatial and temporal resource allocation instead of assuming uniform task latency, incorporates energy by adjusting intervals and overheads, and accounts for heterogeneous regions and partial reconfiguration constraints on merging and splitting. On the Xilinx Zedboard, this yields fairness improvements of 24.2 to 98.4 percent compared to prior algorithms while enabling trade-offs such as 55.3 times energy versus 69.3 times fairness.
What carries the argument
THEMIS algorithm that integrates spatiotemporal fairness, energy-aware interval adjustment, and constraints from heterogeneous partial reconfiguration regions.
If this is right
- Prior scheduling methods are outperformed in fairness by 24.2 to 98.4 percent.
- Energy efficiency can be traded against fairness at factors of 55.3x versus 69.3x.
- The approach respects real FPGA limits on region merging and splitting that earlier work overlooked.
- Cloud providers gain a method to balance fairness and energy in multi-tenant FPGA setups.
Where Pith is reading between the lines
- Extending the energy and heterogeneity logic to other accelerators like GPUs might yield similar fairness gains.
- Production workloads with varying task latencies could amplify or reduce the reported improvements.
- Scaling the evaluation to multi-board or larger FPGA systems would test if the trade-offs remain stable.
- The method could influence how partial reconfiguration is managed in future FPGA cloud services.
Load-bearing premise
Results from testing on one specific Xilinx Zedboard board with its reconfiguration rules and energy model apply to other FPGAs and actual cloud workloads.
What would settle it
Running the same workloads on a different FPGA platform or with real multi-tenant traces that have non-uniform latencies and seeing if fairness gains disappear would disprove the generalization.
Figures
read the original abstract
Using correct design metrics and understanding the limitations of the underlying technology is critical to developing effective scheduling algorithms. Unfortunately, existing scheduling techniques used \emph{incorrect} metrics and had \emph{unrealistic} assumptions for fair scheduling of multi-tenant FPGAs where each tenant is aimed to share approximately the same number of resources both spatially and temporally. This paper introduces an enhanced fair scheduling algorithm for multi-tenant FPGA use, addressing previous metric and assumption issues, with three specific improvements claimed First, our method ensures spatiotemporal fairness by considering both spatial and temporal aspects, addressing the limitation of prior work that assumed uniform task latency. Second, we incorporate energy considerations into fairness by adjusting scheduling intervals and accounting for energy overhead, thereby balancing energy efficiency with fairness. Third, we acknowledge overlooked aspects of FPGA multi-tenancy, including heterogeneous regions and the constraints on dynamically merging/splitting partially reconfigurable regions. We develop and evaluate our improved fair scheduling algorithm with these three enhancements. Inspired by the Greek goddess of law and personification of justice, we name our fair scheduling solution THEMIS: \underline{T}ime, \underline{H}eterogeneity, and \underline{E}nergy \underline{Mi}nded \underline{S}cheduling. We used the Xilinx Zedboard XC7Z020 to quantify our approach's savings. Compared to previous algorithms, our improved scheduling algorithm enhances fairness between 24.2--98.4\% and allows a trade-off between 55.3$\times$ in energy vs. 69.3$\times$ in fairness. The paper thus informs cloud providers about future scheduling optimizations for fairness with related challenges and opportunities.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces THEMIS, an enhanced fair scheduling algorithm for multi-tenant FPGAs that incorporates spatiotemporal fairness (addressing non-uniform task latencies), energy overheads in scheduling intervals, and constraints from heterogeneous partially reconfigurable regions and dynamic merging/splitting. It evaluates the approach on a Xilinx Zedboard XC7Z020, claiming 24.2--98.4% fairness improvements over prior algorithms and energy-fairness trade-offs of 55.3× versus 69.3×.
Significance. If the measured gains prove robust, the work could usefully inform cloud FPGA schedulers by highlighting the interplay of energy, heterogeneity, and PR-region constraints. The explicit treatment of non-uniform latencies and region merging is a concrete advance over prior metric choices.
major comments (2)
- [§5] §5 (Evaluation) and abstract: the headline claims of 24.2--98.4% fairness improvement and 55.3×/69.3× trade-offs rest exclusively on measurements from the single Xilinx Zedboard XC7Z020 under its specific PR-region sizes, reconfiguration latencies, and energy model; no sensitivity study or results on larger platforms (Virtex Ultrascale+, AWS F1) are supplied, which directly undermines the generalization of the quantitative results.
- [§4] §4 (Algorithm description): no pseudocode, explicit baseline definitions, or closed-form fairness metric is provided, making it impossible to verify how the three claimed enhancements (spatiotemporal fairness, energy adjustment, heterogeneity handling) produce the reported deltas; this is load-bearing for the central contribution.
minor comments (2)
- [Abstract] Abstract and §1: the phrase 'enhances fairness between 24.2--98.4%' is ambiguous; clarify whether this is relative improvement, absolute delta, or range across workloads.
- [§5] Figure captions and §5: workload details (task mix, arrival rates, region sizes) are referenced but not tabulated; add a summary table for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our contributions. We address each major point below and indicate the planned revisions.
read point-by-point responses
-
Referee: [§5] §5 (Evaluation) and abstract: the headline claims of 24.2--98.4% fairness improvement and 55.3×/69.3× trade-offs rest exclusively on measurements from the single Xilinx Zedboard XC7Z020 under its specific PR-region sizes, reconfiguration latencies, and energy model; no sensitivity study or results on larger platforms (Virtex Ultrascale+, AWS F1) are supplied, which directly undermines the generalization of the quantitative results.
Authors: We agree that all quantitative results are derived from the Xilinx Zedboard XC7Z020. This platform was chosen because it is a widely available device supporting partial reconfiguration with heterogeneous regions, allowing direct measurement of the spatiotemporal and energy effects under realistic constraints. The relative improvements (24.2--98.4%) arise from the new metrics rather than platform-specific tuning. In the revised manuscript we will add a dedicated subsection in §5 discussing how the fairness metric, energy-interval adjustment, and region-merging logic scale to larger devices such as Virtex Ultrascale+ and AWS F1, including qualitative analysis of reconfiguration latency and region-size effects. New experimental results on those platforms are not feasible within the current hardware access, so the quantitative claims will remain tied to the evaluated board while the algorithmic contributions are framed more generally. revision: partial
-
Referee: [§4] §4 (Algorithm description): no pseudocode, explicit baseline definitions, or closed-form fairness metric is provided, making it impossible to verify how the three claimed enhancements (spatiotemporal fairness, energy adjustment, heterogeneity handling) produce the reported deltas; this is load-bearing for the central contribution.
Authors: We accept this criticism. The current manuscript describes the three enhancements in prose but omits pseudocode, precise baseline definitions, and the closed-form expression for the spatiotemporal fairness metric. In the revised version we will insert (i) a new Algorithm 1 box containing the full THEMIS scheduling procedure, (ii) explicit definitions of the three prior algorithms used for comparison, and (iii) the mathematical formulation of the fairness metric that incorporates both spatial occupancy and temporal latency variation. These additions will make the source of the reported deltas verifiable without altering the experimental results. revision: yes
Circularity Check
No circularity; claims are empirical hardware measurements, not self-referential definitions or fitted predictions
full rationale
The paper's central claims (fairness improvements of 24.2–98.4% and energy/fairness trade-offs) are presented as direct outcomes of running the THEMIS scheduler on a Xilinx Zedboard XC7Z020 under measured partial-reconfiguration and energy conditions. No equations define a metric in terms of the scheduler output and then treat the output as an independent prediction; no self-citation chain is invoked to justify uniqueness or force the result; the algorithm improvements are described as explicit handling of spatial/temporal/energy/heterogeneity factors rather than ansatzes smuggled from prior author work. The derivation chain is therefore self-contained against external benchmarks (the physical device runs).
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption FPGA partial reconfiguration regions have fixed heterogeneity and merging/splitting constraints that must be respected by any scheduler.
Reference graph
Works this paper leans on
-
[1]
Sharing, Protection, and Compatibility for Reconfigurable Fabric with A MORPHOS,
A. Khawaja, J. Landgraf, R. Prakash, M. Wei, E. Schkufza, and C. J. Rossbach, “Sharing, Protection, and Compatibility for Reconfigurable Fabric with A MORPHOS,” in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) , 2018, pp. 107–127
work page 2018
-
[2]
Do OS abstractions make sense on FPGAs?
D. Korolija, T. Roscoe, and G. Alonso, “Do OS abstractions make sense on FPGAs?” in 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) , 2020, pp. 991–1010
work page 2020
-
[3]
Virtualized FPGA Acceler- ators for Efficient Cloud Computing,
S. A. Fahmy, K. Vipin, and S. Shreejith, “Virtualized FPGA Acceler- ators for Efficient Cloud Computing,” in 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom) . IEEE, 2015, pp. 430–435
work page 2015
-
[4]
Virtualizing FPGAs in the Cloud,
Y . Zha and J. Li, “Virtualizing FPGAs in the Cloud,” in ACM Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , 2020, pp. 845–858
work page 2020
-
[5]
A Hypervisor for Shared-Memory FPGA Platforms,
J. Ma, G. Zuo, K. Loughlin, X. Cheng, Y . Liu, A. M. Eneyew, Z. Qi, and B. Kasikci, “A Hypervisor for Shared-Memory FPGA Platforms,” in Proceedings of the Twenty-Fifth International Conference on Archi- tectural Support for Programming Languages and Operating Systems , 2020, pp. 827–844
work page 2020
-
[6]
Resource Elastic Virtualization for FPGAs using OpenCL,
A. Vaishnav, K. D. Pham, D. Koch, and J. Garside, “Resource Elastic Virtualization for FPGAs using OpenCL,” in IEEE International Con- ference on Field Programmable Logic and Applications (FPL) , 2018, pp. 111–1117
work page 2018
-
[7]
Looking beyond GPUs for DNN Scheduling on Multi-Tenant Clusters,
J. Mohan, A. Phanishayee, J. Kulkarni, and V . Chidambaram, “Looking beyond GPUs for DNN Scheduling on Multi-Tenant Clusters,” in 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), 2022, pp. 579–596
work page 2022
-
[8]
A Survey of Multi-tenant Deep Learning Inference on GPU,
F. Yu, D. Wang, L. Shangguan, M. Zhang, C. Liu, and X. Chen, “A Survey of Multi-tenant Deep Learning Inference on GPU,” arXiv preprint arXiv:2203.09040, 2022
-
[9]
Performance Isolation and Fairness for Multi-tenant Cloud Storage,
D. Shue, M. J. Freedman, and A. Shaikh, “Performance Isolation and Fairness for Multi-tenant Cloud Storage,” inpresented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), 2012, pp. 349–362
work page 2012
-
[10]
SQLVM: Performance Isolation in Multi-Tenant Relational Database- as-a-Service,
V . Narasayya, S. Das, M. Syamala, B. Chandramouli, and S. Chaudhuri, “SQLVM: Performance Isolation in Multi-Tenant Relational Database- as-a-Service,” in CIDR 2013, 2013
work page 2013
-
[11]
QoS Aware Design-Time/Run-Time Manager for FPGA-Based Embedded Systems,
A. Duhamel and S. Pillement, “QoS Aware Design-Time/Run-Time Manager for FPGA-Based Embedded Systems,” in Design and Archi- tecture for Signal and Image Processing: 15th International Workshop, DASIP 2022, Budapest, Hungary, June 20–22, 2022, Proceedings . Springer, 2022, pp. 96–107
work page 2022
-
[12]
Benefits of Partial Reconfiguration,
C. Kao, “Benefits of Partial Reconfiguration,” Xcell journal, vol. 55, pp. 65–67, 2005. 5https://github.com/aamalik3/THEMIS.git
work page 2005
-
[13]
Koch, Partial reconfiguration on FPGAs: Architectures, Tools and Applications
D. Koch, Partial reconfiguration on FPGAs: Architectures, Tools and Applications. Springer Science & Business Media, 2012, vol. 153
work page 2012
-
[14]
Spatiotemporal Strategies for Long-Term FPGA Resource Management,
A. Mehrabi, D. J. Sorin, and B. C. Lee, “Spatiotemporal Strategies for Long-Term FPGA Resource Management,” in 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 2022, pp. 198–209
work page 2022
-
[15]
Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale,
M. Huang, D. Wu, C. H. Yu, Z. Fang, M. Interlandi, T. Condie, and J. Cong, “Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale,” in Proceedings of the Seventh ACM Symposium on Cloud Computing , 2016, pp. 456–469
work page 2016
-
[16]
RC3E: Provision and Management of Reconfigurable Hardware Accelerators in a Cloud Environment
O. Knodel and R. G. Spallek, “RC3E: Provision and Management of Reconfigurable Hardware Accelerators in a Cloud Environment v,”arXiv preprint arXiv:1508.06843, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[17]
UG947 Vivado Design Suite Tutorial Dynamic Function eXchange , Xilinx Inc, 4 2022, v2021.2
work page 2022
-
[18]
EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems
Malik, Arsalan Ali and Karabulut, Emre and Aysu, Aydin, “EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems,” arXiv preprint arXiv:2501.16205 , 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
PR Crisis: Analyzing and Fixing Partial Reconfiguration in Multi-Tenant Cloud FPGAs,
E. Karabulut, C. Yuvarajappa, M. I. Shaik, S. Potluri, A. Awad, and A. Aysu, “PR Crisis: Analyzing and Fixing Partial Reconfiguration in Multi-Tenant Cloud FPGAs,” in ACM Workshop on Attacks and Solutions in Hardware Security (ASHES) , 2022, pp. 101–106
work page 2022
-
[20]
M. A. Rihani et al, “Dynamic and Partial Reconfiguration Power Con- sumption Runtime Measurements Analysis for ZYNQ SoC Devices ,” in IEEE International Symposium on Wireless Communication Systems , 2016, pp. 592–596
work page 2016
-
[21]
Cuttlesys: Data-driven resource management for interactive services on Reconfigurable Multicores,
N. Kulkarni, G. Gonzalez-Pumariega, A. Khurana, C. A. Shoemaker, C. Delimitrou, and D. H. Albonesi, “Cuttlesys: Data-driven resource management for interactive services on Reconfigurable Multicores,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchi- tecture (MICRO). IEEE, 2020, pp. 650–664
work page 2020
-
[22]
REF: Resource Elasticity Fairness with Sharing Incentives for Multiprocessors,
S. M. Zahedi and B. C. Lee, “REF: Resource Elasticity Fairness with Sharing Incentives for Multiprocessors,” ACM Sigplan Notices , vol. 49, no. 4, pp. 145–160, 2014
work page 2014
-
[23]
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types,
A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, “Dominant Resource Fairness: Fair Allocation of Multiple Resource Types,” in 8th USENIX symposium on networked systems design and implementation (NSDI 11) , 2011
work page 2011
-
[24]
Beyond Dominant Resource Fairness: Extensions, Limitations, and Indivisibilities,
D. C. Parkes, A. D. Procaccia, and N. Shah, “Beyond Dominant Resource Fairness: Extensions, Limitations, and Indivisibilities,” ACM Transactions on Economics and Computation (TEAC) , vol. 3, no. 1, pp. 1–22, 2015
work page 2015
-
[25]
Round-Robin Scheduling for Max-Min Fairness in Data Network,
E. L. Hahne, “Round-Robin Scheduling for Max-Min Fairness in Data Network,” IEEE Journal on Selected Areas in communications , vol. 9, no. 7, pp. 1024–1039, 1991
work page 1991
-
[26]
Schedule-Induced Polydipsia as a Function of Fixed Interval Length,
J. L. Falk, “Schedule-Induced Polydipsia as a Function of Fixed Interval Length,” Journal of the experimental analysis of Behavior , vol. 9, no. 1, pp. 37–39, 1966
work page 1966
-
[27]
MachSuite: Benchmarks for Accelerator Design and Customized Architectures,
B. Reagen, R. Adolf, Y . S. Shao, G.-Y . Wei, and D. Brooks, “MachSuite: Benchmarks for Accelerator Design and Customized Architectures,” in 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2014, pp. 110–119
work page 2014
-
[28]
ZyCAP: Efficient Partial Reconfiguration Management on the Xilinx Zynq,
K. Vipin and S. A. Fahmy, “ZyCAP: Efficient Partial Reconfiguration Management on the Xilinx Zynq,” IEEE Embedded Systems Letters , vol. 6, no. 3, pp. 41–44, 2014
work page 2014
-
[29]
A Novel Mechanism for Effective Hardware Task Preemption in Dynamically Reconfigurable Systems,
K. Jozwik, H. Tomiyama, S. Honda, and H. Takada, “A Novel Mechanism for Effective Hardware Task Preemption in Dynamically Reconfigurable Systems,” in 2010 International Conference on Field Programmable Logic and Applications . IEEE, 2010, pp. 352–355
work page 2010
-
[30]
VR-ZYCAP: A Versatile Resource-Level ICAP Controller for ZYNQ SOC,
B. Sultana, A. Ullah, A. A. Malik, A. Zahir, P. Reviriego, F. B. Muslim, N. Ullah, and W. Ahmad, “VR-ZYCAP: A Versatile Resource-Level ICAP Controller for ZYNQ SOC,” Electronics, vol. 10, no. 8, p. 899, 2021
work page 2021
-
[31]
Run-time Partial Reconfigura- tion Speed Investigation and Architectural Design Space Exploration,
M. Liu, W. Kuehn, Z. Lu, and A. Jantsch, “Run-time Partial Reconfigura- tion Speed Investigation and Architectural Design Space Exploration,” in 2009 International Conference on Field Programmable Logic and Applications. IEEE, 2009, pp. 498–502
work page 2009
-
[32]
Runtime FPGA Partial Reconfiguration,
E. J. McDonald, “Runtime FPGA Partial Reconfiguration,” in 2008 IEEE Aerospace Conference. IEEE, 2008, pp. 1–7
work page 2008
-
[33]
A High Speed Open Source Controller for FPGA Partial Reconfiguration,
K. Vipin and S. A. Fahmy, “A High Speed Open Source Controller for FPGA Partial Reconfiguration,” in 2012 International Conference on Field-Programmable Technology. IEEE, 2012, pp. 61–66
work page 2012
-
[34]
FaRM: Fast Reconfiguration Manager for Reducing Reconfiguration Time Overhead on FPGA,
F. Duhem, F. Muller, and P. Lorenzini, “FaRM: Fast Reconfiguration Manager for Reducing Reconfiguration Time Overhead on FPGA,” in Reconfigurable Computing: Architectures, Tools and Applications: 7th International Symposium, ARC 2011, Belfast, UK, March 23-25, 2011. Proceedings 7. Springer, 2011, pp. 253–260
work page 2011
-
[35]
Achieving Energy Efficiency through Runtime Partial Reconfiguration on Reconfigurable Systems,
S. Liu, R. N. Pittman, A. Forin, and J.-L. Gaudiot, “Achieving Energy Efficiency through Runtime Partial Reconfiguration on Reconfigurable Systems,” ACM Transactions on Embedded Computing Systems (TECS), vol. 12, no. 3, pp. 1–21, 2013
work page 2013
-
[36]
A. R. Bucknall and S. A. Fahmy, “ZyPR: End-to-End Build Tool and Runtime Manager for Partial Reconfiguration of FPGA SoCs at the Edge,” ACM Transactions on Reconfigurable Technology and Systems , 2023
work page 2023
-
[37]
Enabling secure and efficient sharing of accelerators in expeditionary systems,
A. A. Malik, E. Karabulut, A. Awad, and A. Aysu, “Enabling secure and efficient sharing of accelerators in expeditionary systems,” Journal of Hardware and Systems Security , vol. 8, no. 2, pp. 94–112, 2024
work page 2024
-
[38]
Isolation design flow effectiveness evaluation methodology for Zynq SoCs,
Malik, Arsalan Ali and Ullah, Anees and Zahir, Ali and others, “Isolation design flow effectiveness evaluation methodology for Zynq SoCs,” Electronics, 2020
work page 2020
-
[39]
Amazon, EC, “Amazon Web Services,” Available in: http://aws. amazon. com/es/ec2/(November 2012), 2015
work page 2012
-
[40]
Copeland, Marshall and Soh, Julian and Puca, Anthony and Manning, Mike and Gollob, David and Copeland, Marshall and Soh, Julian and Puca, Anthony and Manning, Mike and Gollob, David, Microsoft Azure and Cloud Computing . Springer, 2015
work page 2015
-
[41]
A review on amazon web service (aws), microsoft azure & google cloud platform (gcp) services,
Gupta, Bulbul and Mittal, Pooja and Mufti, Tabish, “A review on amazon web service (aws), microsoft azure & google cloud platform (gcp) services,” in Proceedings of the 2nd International Conference on ICT for Digital, Smart, and Sustainable Development, ICIDSSD 2020, 27-28 February 2020, Jamia Hamdard, New Delhi, India , 2021
work page 2020
-
[42]
SoK: Secure FPGA multi-tenancy in the cloud: Challenges and opportunities,
Dessouky, Ghada and Sadeghi, Ahmad-Reza and Zeitouni, Shaza, “SoK: Secure FPGA multi-tenancy in the cloud: Challenges and opportunities,” in 2021 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.