EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems
Pith reviewed 2026-05-23 04:46 UTC · model grok-4.3
The pith
EPOCH lets FPGA tasks stop at any clock cycle, save their full state to off-chip memory, and resume later without restarting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EPOCH is the first out-of-the-box framework that can interrupt a tenant's execution at any arbitrary clock cycle, capture its state, save this state snapshot in off-chip memory with fine-grain granularity, and later resume execution from the saved snapshot, all while automating the processes, shielding users from complexities, and synchronizing logic in a common clock domain to prevent timing violations.
What carries the argument
Automated cycle-accurate state extraction and restoration for FPGA elements including LUTs, flip-flops, BRAMs, and DSP units, performed via frame-level operations synchronized in a single clock domain.
If this is right
- FPGA tasks in multi-tenant clouds can be switched without context loss or restart, enabling the OS to balance resources the same way it does for CPU tasks.
- Context save takes 62.2 microseconds and restore takes 67.4 microseconds per frame on the tested ZynQ device.
- The framework works on existing FPGA hardware and tool flows without vendor modifications.
- All fundamental FPGA resources (LUTs, flip-flops, BRAMs, DSPs) are covered by the snapshot process.
Where Pith is reading between the lines
- Operating systems could treat FPGA accelerators as preemptible resources comparable to CPU threads, changing how cloud schedulers allocate hardware.
- The approach might reduce wasted FPGA time when a higher-priority task arrives, lowering the cost of sharing one chip among many users.
- Future designs could combine this state snapshot method with partial reconfiguration to move tasks between different FPGA regions without data loss.
Load-bearing premise
The FPGA fabric and standard design tools allow reading and writing the entire internal state at any chosen cycle without adding timing violations or needing changes from the chip vendor.
What would settle it
Running a design on the ZynQ-XC7Z020, stopping it at a random cycle, saving and restoring the state, then checking whether the resumed output matches the uninterrupted run at the same number of cycles afterward.
Figures
read the original abstract
FPGAs are increasingly used in multi-tenant cloud environments to offload compute-intensive tasks from the main CPU. The operating system (OS) plays a vital role in identifying tasks suitable for offloading and coordinating between the CPU and FPGA for seamless task execution. The OS leverages preemption to manage CPU efficiently and balance CPU time; however, preempting tasks running on FPGAs without context loss remains challenging. Despite growing reliance on FPGAs, vendors have yet to deliver a solution that fully preserves and restores task context. This paper presents EPOCH, the first out-of-the-box framework to seamlessly preserve the state of tasks running on multi-tenant cloud FPGAs. EPOCH enables interrupting a tenant's execution at any arbitrary clock cycle, capturing its state, and saving this 'state snapshot' in off-chip memory with fine-grain granularity. Subsequently, when task resumption is required, EPOCH can resume execution from the saved 'state snapshot', eliminating the need to restart the task from scratch. EPOCH automates intricate processes, shields users from complexities, and synchronizes all underlying logic in a common clock domain, mitigating timing violations and ensuring seamless handling of interruptions. EPOCH proficiently captures the state of fundamental FPGA elements, such as look-up tables, flip-flops, block--RAMs, and digital signal processing units. On real hardware, ZynQ-XC7Z020 SoC, the proposed solution achieves context save and restore operations per frame in 62.2us and 67.4us, respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents EPOCH, an out-of-the-box framework for enabling preemption on multi-tenant FPGAs by allowing interruption of tenant execution at any arbitrary clock cycle, capturing the full internal state (LUTs, flip-flops, BRAMs, DSPs) into a snapshot saved in off-chip memory, and later resuming from that snapshot. It automates the process, synchronizes logic to a common clock domain to avoid timing issues, and reports context save/restore times of 62.2 µs and 67.4 µs per frame on a ZynQ-XC7Z020 device.
Significance. If the central claims hold with proper verification, EPOCH would address a key gap in FPGA cloud computing by enabling true preemption without task restart, improving resource utilization in multi-tenant settings. The use of real hardware measurements on a ZynQ device is a positive aspect, providing concrete timing data rather than simulation-only results.
major comments (2)
- [Abstract] Abstract: The claim of enabling interruption 'at any arbitrary clock cycle' with cycle-accurate state capture is load-bearing for the contribution, yet the reported metrics are given only as per-frame times with no accompanying timing reports, clock-skew analysis, or description of how readback is triggered without introducing violations that could invalidate the snapshot.
- [Abstract] Abstract: No verification steps, error bars, or exclusion criteria are described for confirming that a restored snapshot produces identical results to an uninterrupted execution; this is required to substantiate that the capture process itself does not corrupt state for LUTs, FFs, BRAMs, or DSPs.
minor comments (1)
- [Abstract] Abstract: Typo in 'block--RAMs' (double dash).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim of enabling interruption 'at any arbitrary clock cycle' with cycle-accurate state capture is load-bearing for the contribution, yet the reported metrics are given only as per-frame times with no accompanying timing reports, clock-skew analysis, or description of how readback is triggered without introducing violations that could invalidate the snapshot.
Authors: The abstract states that EPOCH synchronizes all logic to a common clock domain to mitigate timing violations. We acknowledge that explicit timing reports, clock-skew analysis, and a description of the readback trigger mechanism would better substantiate the cycle-accurate claim. We will add these details in the revision. revision: yes
-
Referee: [Abstract] Abstract: No verification steps, error bars, or exclusion criteria are described for confirming that a restored snapshot produces identical results to an uninterrupted execution; this is required to substantiate that the capture process itself does not corrupt state for LUTs, FFs, BRAMs, or DSPs.
Authors: The manuscript reports successful hardware execution on the ZynQ device, which implies verification occurred. To address the concern directly, we will revise the paper to include an explicit description of the verification methodology, including any error bars and criteria applied to confirm identical results for LUTs, FFs, BRAMs, and DSPs. revision: yes
Circularity Check
Implementation paper with no derivation chain or fitted predictions
full rationale
This is an implementation and measurement paper describing a framework for FPGA context saving and preemption. The abstract and provided text contain no equations, no fitted parameters, no self-citations used as load-bearing for a derivation, and no predictions that reduce to inputs by construction. The central claims are supported by hardware measurements on ZynQ-XC7Z020 (e.g., 62.2 µs / 67.4 µs per frame), making the work self-contained against external benchmarks with no circular steps present.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Dynamic Function eXchange, UG909 (v2023.2),
AMD Xilinix Inc, “Dynamic Function eXchange, UG909 (v2023.2),” , 2023. [Online]. Available: https://docs.xilinx.com/r/ en-US/ug909-vivado-partial-reconfiguration
work page 2023
-
[2]
Spatiotemporal Strategies for Long-Term FPGA Resource Management,
A. Mehrabi, D. J. Sorin, and B. C. Lee, “Spatiotemporal Strategies for Long-Term FPGA Resource Management,” in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2022, pp. 198–209
work page 2022
-
[3]
Do OS abstractions make sense on FPGAs?,
D. Korolija, T. Roscoe, and G. Alonso, “Do OS abstractions make sense on FPGAs?,” in 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) , 2020, pp. 991–1010
work page 2020
-
[4]
Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS,
A. Khawaja, J. Landgraf, R. Prakash, M. Wei, E. Schkufza, and C. J. Rossbach, “Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS,” in USENIX Symposium on Operating Systems Design and Implementation (OSDI) , 2018, pp. 107–127
work page 2018
-
[5]
THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs ,
E. Karabulut, A. A. Malik, A. Awad, and A. Aysu, “ THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs ,” IEEE Transactions on Computers , no. 01, pp. 1–14, May 2025. [Online]. Available: https://doi.ieeecomputersociety.org/10. 1109/TC.2025.3566874
-
[6]
Context save and restore of partial reconfiguration regions for Xilinx FPGAs,
Eckert, Marcel and Meyer, Dominik and Klauer, Bernd, “Context save and restore of partial reconfiguration regions for Xilinx FPGAs,” in 2019 14th International Symposium on Reconfigurable Communication- centric Systems-on-Chip (ReCoSoC) . IEEE, 2019, pp. 5–12
work page 2019
-
[7]
A hypervisor for shared-memory FPGA platforms,
J. Ma, G. Zuo, K. Loughlin, X. Cheng, Y . Liu, A. M. Eneyew, Z. Qi, and B. Kasikci, “A hypervisor for shared-memory FPGA platforms,” inACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASLPOS) , 2020, pp. 827–844
work page 2020
-
[8]
Hardware Checkpointing and Productive Debugging Flows for FPGAs,
S. Attia, “Hardware Checkpointing and Productive Debugging Flows for FPGAs,” Ph.D. dissertation, University of Toronto, 2022
work page 2022
-
[9]
Preemptive hardware multitasking in ReconOS,
M. Happe, A. Traber, and A. Keller, “Preemptive hardware multitasking in ReconOS,” in Applied Reconfigurable Computing: 11th International Symposium, ARC 2015, Bochum, Germany, April 13-17, 2015, Proceed- ings 11. Springer, 2015, pp. 79–90
work page 2015
-
[10]
Feel free to interrupt: Safe task stopping to enable FPGA checkpointing and context switching,
S. Attia and V . Betz, “Feel free to interrupt: Safe task stopping to enable FPGA checkpointing and context switching,” ACM Transactions on Reconfigurable Technology and Systems (TRETS) , vol. 13, no. 1, pp. 1–27, 2020
work page 2020
-
[11]
Stop and look: A novel checkpointing and debugging flow for FPGAs,
Attia, Sameh and Betz, Vaughn, “Stop and look: A novel checkpointing and debugging flow for FPGAs,” IEEE Transactions on Computers , vol. 71, no. 10, pp. 2513–2526, 2021
work page 2021
-
[12]
7 Series FPGAs Configurable Logic Block, UG474, v1. 13.1,
UG474, Series FPGAs Configurable Logic Block, “7 Series FPGAs Configurable Logic Block, UG474, v1. 13.1,” San Jose, CA, USA , pp. 1–74, 2016
work page 2016
-
[13]
7 Series FPGAs Configuration User Guide, UG470 (v1. 11),
UG470, Series FPGAs Configuration User Guide, “7 Series FPGAs Configuration User Guide, UG470 (v1. 11),” San Jose, CA, USA , 2016
work page 2016
-
[14]
A hybrid approach to FPGA configuration scrubbing,
A. Stoddard, A. Gruwell, P. Zabriskie, and M. J. Wirthlin, “A hybrid approach to FPGA configuration scrubbing,” IEEE Transactions on Nuclear Science, vol. 64, no. 1, pp. 497–503, 2016
work page 2016
-
[15]
UG947 Vivado Design Suite Tutorial Dynamic Function eXchange , Xilinx Inc, 4 2022, v2021.2
work page 2022
-
[16]
BITMAN: A tool and API for FPGA bitstream manipulations,
K. D. Pham, E. Horta, and D. Koch, “BITMAN: A tool and API for FPGA bitstream manipulations,” inDesign, Automation & Test in Europe Conference & Exhibition (DATE), 2017 . IEEE, 2017, pp. 894–897
work page 2017
-
[17]
Zynq-7000 All Programmable SoC Overview,
PL, Programmable Logic, “Zynq-7000 All Programmable SoC Overview,” Feb, 2012
work page 2012
-
[18]
The RISC-V instruction set manual,
A. Waterman, Y . Lee, D. Patterson, K. Asanovic, V . I. U. level Isa, A. Waterman, Y . Lee, and D. Patterson, “The RISC-V instruction set manual,” Volume I: User-Level ISA’, version, vol. 2, 2014
work page 2014
-
[19]
Machsuite: Benchmarks for accelerator design and customized architectures,
B. Reagen, R. Adolf, Y . S. Shao, G.-Y . Wei, and D. Brooks, “Machsuite: Benchmarks for accelerator design and customized architectures,” in IEEE International Symposium on Workload Characterization (ISWC) , 2014, pp. 110–119
work page 2014
-
[20]
Conte, B-Con/crypto-algorithms
B. Conte, B-Con/crypto-algorithms. Conte, Brad, 12 2020. [Online]. Available: https://github.com/B-Con/crypto-algorithms
work page 2020
-
[21]
An overview of common benchmarks,
R. P. Weicker, “An overview of common benchmarks,” Computer, vol. 23, no. 12, pp. 65–75, 1990
work page 1990
-
[22]
Fast-fourier lattice-based compact signatures over NTRU,
Fouque, PA and Hoffstein, J and Kirchner, P and Lyubashevsky, V and Pornin, T and Prest, T and Ricosset, T and Seiler, G and Whyte, W and Zhang, Z and others, “Fast-fourier lattice-based compact signatures over NTRU,” 2019
work page 2019
-
[23]
Vivado Design Suite Properties Reference Guide, UG912 (v2023.2),
AMD Xilinix Inc, “Vivado Design Suite Properties Reference Guide, UG912 (v2023.2),” , 2023. [Online]. Available: https://docs.xilinx.com/ r/en-US/ug912-vivado-properties/SNAPPING MODE
work page 2023
-
[24]
Configuration Readback Capture in UltraScale FP- GAs,
Tapp, Stephanie, “Configuration Readback Capture in UltraScale FP- GAs,” Xilinx All Programmable, www. xilinx. com, XAPP1230 (v1. 1) , pp. 1–24, 2015
work page 2015
-
[25]
Chstone: A benchmark program suite for practical C-based high-level synthesis,
Hara, Yuko and Tomiyama, Hiroyuki and Honda, Shinya and Takada, Hiroaki and Ishii, Katsuya, “Chstone: A benchmark program suite for practical C-based high-level synthesis,” in 2008 IEEE International Symposium on Circuits and Systems , 2008
work page 2008
-
[26]
Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs,
Zhou, Yuan and Gupta, Udit and Dai, Steve and Zhao, Ritchie and Srivastava, Nitish and Jin, Hanchen and Featherston, Joseph and Lai, Yi-Hsiang and Liu, Gai and Velasquez, Gustavo Angarita and others, “Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs,” in Proceedings of the ACM/SIGDA Interna- tional Symposium on FPG...
work page 2018
-
[27]
Rosetta: A realistic benchmark suite for software programmable FP- GAs,
Zhou, Yuan and Gupta, Udit and Dai, Steve and Zhao, Ritchie and Srivastava, Nitish and Jin, Hanchen and Featherston, Joseph and Lai, Yi-Hsiang and Liu, Gai and Velasquez, Gustavo Angarita and others, “Rosetta: A realistic benchmark suite for software programmable FP- GAs,” in Suite of Embedded Applications and Kernels Workshop , 2015
work page 2015
-
[28]
MLSBench: A Benchmark Set for Machine Learning based FPGA HLS Design Flows,
Goswami, Pingakshya and Shahshahani, Masoud and Bhatia, Dinesh, “MLSBench: A Benchmark Set for Machine Learning based FPGA HLS Design Flows,” in2022 IEEE 13th Latin America Symposium on Circuits and System, 2022
work page 2022
-
[29]
A verilog RTL synthesis tool for heterogeneous FPGAs,
Jamieson, Peter and Rose, Jonathan, “A verilog RTL synthesis tool for heterogeneous FPGAs,” in FPL. IEEE, 2005, pp. 305–310
work page 2005
-
[30]
Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering,
Hansen, Mark C and Yalcin, Hakan and Hayes, John , “Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering,” Design & Test of Computers, vol. 16, no. 3, pp. 72–80, 1999
work page 1999
-
[31]
Das, Sunil R and Mukherjee, Sujoy and Petriu, Emil M and Assaf, Mansour H and Sahinoglu, Mehmet and Jone, Wen-Ben, “An improved fault simulation approach based on verilog with application to ISCAS 12 benchmark circuits,” in 2006 IEEE Instrumentation and Measurement Technology Conference Proceedings, 2006, pp. 1902–1907
work page 2006
-
[32]
S. Liu, R. N. Pittman, and A. Forin, “Minimizing partial reconfiguration overhead with fully streaming DMA engines and intelligent ICAP controller,” in FPGA, 2010, p. 292
work page 2010
-
[33]
DyRACT: A partial reconfiguration enabled accelerator and test platform,
K. Vipin and S. A. Fahmy, “DyRACT: A partial reconfiguration enabled accelerator and test platform,” in 2014 24th international conference on field programmable logic and applications (FPL). IEEE, 2014, pp. 1–7
work page 2014
-
[34]
ZyCAP: Efficient partial reconfiguration management on the Xilinx Zynq,
Vipin, Kizheppatt and Fahmy, Suhaib A, “ZyCAP: Efficient partial reconfiguration management on the Xilinx Zynq,” IEEE Embedded Systems Letters, vol. 6, no. 3, pp. 41–44, 2014
work page 2014
-
[35]
Reducing FPGA compile time with separate compilation for FPGA building blocks,
Xiao, Yuanlong and Park, Dongjoon and Butt, Andrew and Giesen, Hans and Han, Zhaoyang and Ding, Rui and Magnezi, Nevo and Rubin, Raphael and DeHon, Andr´e, “Reducing FPGA compile time with separate compilation for FPGA building blocks,” in 2019 International Conference on Field-Programmable Technology (ICFPT). IEEE, 2019, pp. 153–161
work page 2019
-
[36]
MiCAP: A custom reconfiguration controller for dynamic circuit specialization,
Kulkarni, Amit and Kizheppatt, Vipin and Stroobandt, Dirk, “MiCAP: A custom reconfiguration controller for dynamic circuit specialization,” in 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 2015, pp. 1–6
work page 2015
-
[37]
MiCAP-Pro: A high speed custom reconfiguration controller for Dynamic Circuit Specialization,
Kulkarni, Amit and Stroobandt, Dirk, “MiCAP-Pro: A high speed custom reconfiguration controller for Dynamic Circuit Specialization,” Design Automation for Embedded Systems , vol. 20, no. 4, pp. 341–359, 2016
work page 2016
-
[38]
A tiny and multifunctional ICAP con- troller for dynamic partial reconfiguration system,
Guohua, Wang and Dongming, Luo and Fengzhou, Wang and Adetomi, Adewale and Arslan, Tughrul, “A tiny and multifunctional ICAP con- troller for dynamic partial reconfiguration system,” in 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS) . IEEE, 2017, pp. 71–76
work page 2017
-
[39]
VR-ZYCAP: a versatile resource-level ICAP controller for ZYNQ SOC,
Sultana, Bushra and Ullah, Anees and Malik, Arsalan Ali and Zahir, Ali and Reviriego, Pedro and Muslim, Fahad Bin and Ullah, Nasim and Ahmad, Waleed, “VR-ZYCAP: a versatile resource-level ICAP controller for ZYNQ SOC,” Electronics, vol. 10, no. 8, p. 899, 2021
work page 2021
-
[40]
Fast Partial Reconfiguration, XAPP1338 (v1.0),
Xilinix Inc, “Fast Partial Reconfiguration, XAPP1338 (v1.0),” , 2019. [Online]. Available: https://docs.xilinx.com/r/en-US/ xapp1338-fast-partial-reconfiguration-pci-express/Summary
work page 2019
-
[41]
Cryptkeeper: Improving security with encrypted RAM,
P. A. Peterson, “Cryptkeeper: Improving security with encrypted RAM,” in 2010 IEEE International Conference on Technologies for Homeland Security (HST). IEEE, 2010, pp. 120–126
work page 2010
-
[42]
Memory encryption for general-purpose processors,
S. Gueron, “Memory encryption for general-purpose processors,” IEEE Security & Privacy , vol. 14, no. 6, pp. 54–62, 2016
work page 2016
-
[43]
An overview of DRAM-based security primitives,
Anagnostopoulos, Nikolaos Athanasios and Katzenbeisser, Stefan and Chandy, John and Tehranipoor, Fatemeh, “An overview of DRAM-based security primitives,” Cryptography, vol. 2, no. 2, p. 7, 2018
work page 2018
-
[44]
Isolation design flow effectiveness evaluation methodology for Zynq SoCs,
A. A. Malik, A. Ullah, A. Zahir, A. Qamar, S. K. Khattak, and P. Re- viriego, “Isolation design flow effectiveness evaluation methodology for Zynq SoCs,” Electronics, vol. 9, no. 5, p. 814, 2020
work page 2020
-
[45]
Enabling secure and efficient sharing of accelerators in expeditionary systems,
A. A. Malik, E. Karabulut, A. Awad, and A. Aysu, “Enabling secure and efficient sharing of accelerators in expeditionary systems,” Journal of Hardware and Systems Security , vol. 8, no. 2, pp. 94–112, 2024
work page 2024
-
[46]
Craft: Characterizing and root-causing fault injection threats at pre-silicon,
A. A. Malik, H. Mihir, and A. Aysu, “Craft: Characterizing and root-causing fault injection threats at pre-silicon,” arXiv preprint arXiv:2503.03877, 2025
-
[47]
Ephemeral Key-based Hybrid Hardware Obfuscation,
N. Nasir, A. Ali Malik, I. Tahir, A. Masood, and N. Riaz, “Ephemeral Key-based Hybrid Hardware Obfuscation,” in 2022 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST) , 2022, pp. 646–652. 13
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.