EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems

Arsalan Ali Malik; Aydin Aysu; Emre Karabulut

arxiv: 2501.16205 · v3 · pith:35HIG2FXnew · submitted 2025-01-27 · 💻 cs.DC · cs.AR

EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems

Arsalan Ali Malik , Emre Karabulut , Aydin Aysu This is my paper

Pith reviewed 2026-05-23 04:46 UTC · model grok-4.3

classification 💻 cs.DC cs.AR

keywords FPGA preemptioncontext savingmulti-tenant FPGAstate snapshottask resumptioncloud FPGAheterogeneous systemsZynq SoC

0 comments

The pith

EPOCH lets FPGA tasks stop at any clock cycle, save their full state to off-chip memory, and resume later without restarting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EPOCH as a ready-to-use framework that interrupts FPGA tasks in shared cloud systems at any chosen clock cycle, reads out the complete internal state of logic blocks, memory, and processing units, and writes a snapshot to external memory. When the task needs to continue, the framework writes the saved values back so execution picks up exactly where it left off. This addresses the missing preemption support on FPGAs that already exists for CPUs, allowing the operating system to manage FPGA resources without forcing tasks to start over after each interruption. The method automates the extraction and restoration steps and keeps all logic in one clock domain to avoid timing problems during the save and restore operations.

Core claim

EPOCH is the first out-of-the-box framework that can interrupt a tenant's execution at any arbitrary clock cycle, capture its state, save this state snapshot in off-chip memory with fine-grain granularity, and later resume execution from the saved snapshot, all while automating the processes, shielding users from complexities, and synchronizing logic in a common clock domain to prevent timing violations.

What carries the argument

Automated cycle-accurate state extraction and restoration for FPGA elements including LUTs, flip-flops, BRAMs, and DSP units, performed via frame-level operations synchronized in a single clock domain.

If this is right

FPGA tasks in multi-tenant clouds can be switched without context loss or restart, enabling the OS to balance resources the same way it does for CPU tasks.
Context save takes 62.2 microseconds and restore takes 67.4 microseconds per frame on the tested ZynQ device.
The framework works on existing FPGA hardware and tool flows without vendor modifications.
All fundamental FPGA resources (LUTs, flip-flops, BRAMs, DSPs) are covered by the snapshot process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Operating systems could treat FPGA accelerators as preemptible resources comparable to CPU threads, changing how cloud schedulers allocate hardware.
The approach might reduce wasted FPGA time when a higher-priority task arrives, lowering the cost of sharing one chip among many users.
Future designs could combine this state snapshot method with partial reconfiguration to move tasks between different FPGA regions without data loss.

Load-bearing premise

The FPGA fabric and standard design tools allow reading and writing the entire internal state at any chosen cycle without adding timing violations or needing changes from the chip vendor.

What would settle it

Running a design on the ZynQ-XC7Z020, stopping it at a random cycle, saving and restoring the state, then checking whether the resumed output matches the uninterrupted run at the same number of cycles afterward.

Figures

Figures reproduced from arXiv: 2501.16205 by Arsalan Ali Malik, Aydin Aysu, Emre Karabulut.

**Figure 2.** Figure 2: The configuration logic block (CLB) of Xilinx [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Block diagram of EPOCH on the Xilinx Zynq SoC with EPOCH [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The frame format for two different configurations: (a) frame readback [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of the frame and configuration bit layout within the Xilinx Zynq [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The two FPGA layouts employed in our experiments for (a) basic and [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: The operational workflow of EPOCH consists of four steps. (1) The [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

FPGAs are increasingly used in multi-tenant cloud environments to offload compute-intensive tasks from the main CPU. The operating system (OS) plays a vital role in identifying tasks suitable for offloading and coordinating between the CPU and FPGA for seamless task execution. The OS leverages preemption to manage CPU efficiently and balance CPU time; however, preempting tasks running on FPGAs without context loss remains challenging. Despite growing reliance on FPGAs, vendors have yet to deliver a solution that fully preserves and restores task context. This paper presents EPOCH, the first out-of-the-box framework to seamlessly preserve the state of tasks running on multi-tenant cloud FPGAs. EPOCH enables interrupting a tenant's execution at any arbitrary clock cycle, capturing its state, and saving this 'state snapshot' in off-chip memory with fine-grain granularity. Subsequently, when task resumption is required, EPOCH can resume execution from the saved 'state snapshot', eliminating the need to restart the task from scratch. EPOCH automates intricate processes, shields users from complexities, and synchronizes all underlying logic in a common clock domain, mitigating timing violations and ensuring seamless handling of interruptions. EPOCH proficiently captures the state of fundamental FPGA elements, such as look-up tables, flip-flops, block--RAMs, and digital signal processing units. On real hardware, ZynQ-XC7Z020 SoC, the proposed solution achieves context save and restore operations per frame in 62.2us and 67.4us, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EPOCH gives measured save/restore times on Zynq hardware for FPGA preemption, but the arbitrary-cycle capture claim needs more verification evidence.

read the letter

EPOCH is a framework for saving and restoring FPGA task state in multi-tenant cloud settings. It automates state capture across LUTs, flip-flops, BRAMs, and DSPs, syncs everything to one clock domain, and reports 62.2 µs save and 67.4 µs restore per frame on a ZynQ-XC7Z020. The main new piece is the out-of-the-box automation that hides the details from users while claiming to support interruption at any cycle and resumption from off-chip memory. The real hardware numbers are the clearest contribution here, as they turn a known problem into something with concrete performance data. That part is useful for anyone building schedulers or OS support for FPGAs in clouds. The soft spot is verification. The abstract gives no timing reports, no description of how capture is triggered without introducing skew, and no checks that a restored run matches an uninterrupted one. Frame-based extraction may also limit how arbitrary the stop point actually is. Without those steps, the central claim that the snapshot stays valid rests on unshown details. This paper is for people working on FPGA cloud systems and heterogeneous scheduling. A reader who needs practical numbers on context handling will find something to use. It deserves peer review because it ships actual device measurements and a working framework, even if the review would likely ask for more evidence on correctness and edge cases. I would bring it to a reading group to discuss the implementation choices.

Referee Report

2 major / 1 minor

Summary. The paper presents EPOCH, an out-of-the-box framework for enabling preemption on multi-tenant FPGAs by allowing interruption of tenant execution at any arbitrary clock cycle, capturing the full internal state (LUTs, flip-flops, BRAMs, DSPs) into a snapshot saved in off-chip memory, and later resuming from that snapshot. It automates the process, synchronizes logic to a common clock domain to avoid timing issues, and reports context save/restore times of 62.2 µs and 67.4 µs per frame on a ZynQ-XC7Z020 device.

Significance. If the central claims hold with proper verification, EPOCH would address a key gap in FPGA cloud computing by enabling true preemption without task restart, improving resource utilization in multi-tenant settings. The use of real hardware measurements on a ZynQ device is a positive aspect, providing concrete timing data rather than simulation-only results.

major comments (2)

[Abstract] Abstract: The claim of enabling interruption 'at any arbitrary clock cycle' with cycle-accurate state capture is load-bearing for the contribution, yet the reported metrics are given only as per-frame times with no accompanying timing reports, clock-skew analysis, or description of how readback is triggered without introducing violations that could invalidate the snapshot.
[Abstract] Abstract: No verification steps, error bars, or exclusion criteria are described for confirming that a restored snapshot produces identical results to an uninterrupted execution; this is required to substantiate that the capture process itself does not corrupt state for LUTs, FFs, BRAMs, or DSPs.

minor comments (1)

[Abstract] Abstract: Typo in 'block--RAMs' (double dash).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of enabling interruption 'at any arbitrary clock cycle' with cycle-accurate state capture is load-bearing for the contribution, yet the reported metrics are given only as per-frame times with no accompanying timing reports, clock-skew analysis, or description of how readback is triggered without introducing violations that could invalidate the snapshot.

Authors: The abstract states that EPOCH synchronizes all logic to a common clock domain to mitigate timing violations. We acknowledge that explicit timing reports, clock-skew analysis, and a description of the readback trigger mechanism would better substantiate the cycle-accurate claim. We will add these details in the revision. revision: yes
Referee: [Abstract] Abstract: No verification steps, error bars, or exclusion criteria are described for confirming that a restored snapshot produces identical results to an uninterrupted execution; this is required to substantiate that the capture process itself does not corrupt state for LUTs, FFs, BRAMs, or DSPs.

Authors: The manuscript reports successful hardware execution on the ZynQ device, which implies verification occurred. To address the concern directly, we will revise the paper to include an explicit description of the verification methodology, including any error bars and criteria applied to confirm identical results for LUTs, FFs, BRAMs, and DSPs. revision: yes

Circularity Check

0 steps flagged

Implementation paper with no derivation chain or fitted predictions

full rationale

This is an implementation and measurement paper describing a framework for FPGA context saving and preemption. The abstract and provided text contain no equations, no fitted parameters, no self-citations used as load-bearing for a derivation, and no predictions that reduce to inputs by construction. The central claims are supported by hardware measurements on ZynQ-XC7Z020 (e.g., 62.2 µs / 67.4 µs per frame), making the work self-contained against external benchmarks with no circular steps present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a systems implementation paper; no free parameters, mathematical axioms, or new postulated entities are introduced.

pith-pipeline@v0.9.0 · 5818 in / 1156 out tokens · 34043 ms · 2026-05-23T04:46:31.795094+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Dynamic Function eXchange, UG909 (v2023.2),

AMD Xilinix Inc, “Dynamic Function eXchange, UG909 (v2023.2),” , 2023. [Online]. Available: https://docs.xilinx.com/r/ en-US/ug909-vivado-partial-reconfiguration

work page 2023
[2]

Spatiotemporal Strategies for Long-Term FPGA Resource Management,

A. Mehrabi, D. J. Sorin, and B. C. Lee, “Spatiotemporal Strategies for Long-Term FPGA Resource Management,” in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2022, pp. 198–209

work page 2022
[3]

Do OS abstractions make sense on FPGAs?,

D. Korolija, T. Roscoe, and G. Alonso, “Do OS abstractions make sense on FPGAs?,” in 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) , 2020, pp. 991–1010

work page 2020
[4]

Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS,

A. Khawaja, J. Landgraf, R. Prakash, M. Wei, E. Schkufza, and C. J. Rossbach, “Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS,” in USENIX Symposium on Operating Systems Design and Implementation (OSDI) , 2018, pp. 107–127

work page 2018
[5]

THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs ,

E. Karabulut, A. A. Malik, A. Awad, and A. Aysu, “ THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs ,” IEEE Transactions on Computers , no. 01, pp. 1–14, May 2025. [Online]. Available: https://doi.ieeecomputersociety.org/10. 1109/TC.2025.3566874

work page arXiv 2025
[6]

Context save and restore of partial reconfiguration regions for Xilinx FPGAs,

Eckert, Marcel and Meyer, Dominik and Klauer, Bernd, “Context save and restore of partial reconfiguration regions for Xilinx FPGAs,” in 2019 14th International Symposium on Reconfigurable Communication- centric Systems-on-Chip (ReCoSoC) . IEEE, 2019, pp. 5–12

work page 2019
[7]

A hypervisor for shared-memory FPGA platforms,

J. Ma, G. Zuo, K. Loughlin, X. Cheng, Y . Liu, A. M. Eneyew, Z. Qi, and B. Kasikci, “A hypervisor for shared-memory FPGA platforms,” inACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASLPOS) , 2020, pp. 827–844

work page 2020
[8]

Hardware Checkpointing and Productive Debugging Flows for FPGAs,

S. Attia, “Hardware Checkpointing and Productive Debugging Flows for FPGAs,” Ph.D. dissertation, University of Toronto, 2022

work page 2022
[9]

Preemptive hardware multitasking in ReconOS,

M. Happe, A. Traber, and A. Keller, “Preemptive hardware multitasking in ReconOS,” in Applied Reconfigurable Computing: 11th International Symposium, ARC 2015, Bochum, Germany, April 13-17, 2015, Proceed- ings 11. Springer, 2015, pp. 79–90

work page 2015
[10]

Feel free to interrupt: Safe task stopping to enable FPGA checkpointing and context switching,

S. Attia and V . Betz, “Feel free to interrupt: Safe task stopping to enable FPGA checkpointing and context switching,” ACM Transactions on Reconfigurable Technology and Systems (TRETS) , vol. 13, no. 1, pp. 1–27, 2020

work page 2020
[11]

Stop and look: A novel checkpointing and debugging flow for FPGAs,

Attia, Sameh and Betz, Vaughn, “Stop and look: A novel checkpointing and debugging flow for FPGAs,” IEEE Transactions on Computers , vol. 71, no. 10, pp. 2513–2526, 2021

work page 2021
[12]

7 Series FPGAs Configurable Logic Block, UG474, v1. 13.1,

UG474, Series FPGAs Configurable Logic Block, “7 Series FPGAs Configurable Logic Block, UG474, v1. 13.1,” San Jose, CA, USA , pp. 1–74, 2016

work page 2016
[13]

7 Series FPGAs Configuration User Guide, UG470 (v1. 11),

UG470, Series FPGAs Configuration User Guide, “7 Series FPGAs Configuration User Guide, UG470 (v1. 11),” San Jose, CA, USA , 2016

work page 2016
[14]

A hybrid approach to FPGA configuration scrubbing,

A. Stoddard, A. Gruwell, P. Zabriskie, and M. J. Wirthlin, “A hybrid approach to FPGA configuration scrubbing,” IEEE Transactions on Nuclear Science, vol. 64, no. 1, pp. 497–503, 2016

work page 2016
[15]

UG947 Vivado Design Suite Tutorial Dynamic Function eXchange , Xilinx Inc, 4 2022, v2021.2

work page 2022
[16]

BITMAN: A tool and API for FPGA bitstream manipulations,

K. D. Pham, E. Horta, and D. Koch, “BITMAN: A tool and API for FPGA bitstream manipulations,” inDesign, Automation & Test in Europe Conference & Exhibition (DATE), 2017 . IEEE, 2017, pp. 894–897

work page 2017
[17]

Zynq-7000 All Programmable SoC Overview,

PL, Programmable Logic, “Zynq-7000 All Programmable SoC Overview,” Feb, 2012

work page 2012
[18]

The RISC-V instruction set manual,

A. Waterman, Y . Lee, D. Patterson, K. Asanovic, V . I. U. level Isa, A. Waterman, Y . Lee, and D. Patterson, “The RISC-V instruction set manual,” Volume I: User-Level ISA’, version, vol. 2, 2014

work page 2014
[19]

Machsuite: Benchmarks for accelerator design and customized architectures,

B. Reagen, R. Adolf, Y . S. Shao, G.-Y . Wei, and D. Brooks, “Machsuite: Benchmarks for accelerator design and customized architectures,” in IEEE International Symposium on Workload Characterization (ISWC) , 2014, pp. 110–119

work page 2014
[20]

Conte, B-Con/crypto-algorithms

B. Conte, B-Con/crypto-algorithms. Conte, Brad, 12 2020. [Online]. Available: https://github.com/B-Con/crypto-algorithms

work page 2020
[21]

An overview of common benchmarks,

R. P. Weicker, “An overview of common benchmarks,” Computer, vol. 23, no. 12, pp. 65–75, 1990

work page 1990
[22]

Fast-fourier lattice-based compact signatures over NTRU,

Fouque, PA and Hoffstein, J and Kirchner, P and Lyubashevsky, V and Pornin, T and Prest, T and Ricosset, T and Seiler, G and Whyte, W and Zhang, Z and others, “Fast-fourier lattice-based compact signatures over NTRU,” 2019

work page 2019
[23]

Vivado Design Suite Properties Reference Guide, UG912 (v2023.2),

AMD Xilinix Inc, “Vivado Design Suite Properties Reference Guide, UG912 (v2023.2),” , 2023. [Online]. Available: https://docs.xilinx.com/ r/en-US/ug912-vivado-properties/SNAPPING MODE

work page 2023
[24]

Configuration Readback Capture in UltraScale FP- GAs,

Tapp, Stephanie, “Configuration Readback Capture in UltraScale FP- GAs,” Xilinx All Programmable, www. xilinx. com, XAPP1230 (v1. 1) , pp. 1–24, 2015

work page 2015
[25]

Chstone: A benchmark program suite for practical C-based high-level synthesis,

Hara, Yuko and Tomiyama, Hiroyuki and Honda, Shinya and Takada, Hiroaki and Ishii, Katsuya, “Chstone: A benchmark program suite for practical C-based high-level synthesis,” in 2008 IEEE International Symposium on Circuits and Systems , 2008

work page 2008
[26]

Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs,

Zhou, Yuan and Gupta, Udit and Dai, Steve and Zhao, Ritchie and Srivastava, Nitish and Jin, Hanchen and Featherston, Joseph and Lai, Yi-Hsiang and Liu, Gai and Velasquez, Gustavo Angarita and others, “Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs,” in Proceedings of the ACM/SIGDA Interna- tional Symposium on FPG...

work page 2018
[27]

Rosetta: A realistic benchmark suite for software programmable FP- GAs,

Zhou, Yuan and Gupta, Udit and Dai, Steve and Zhao, Ritchie and Srivastava, Nitish and Jin, Hanchen and Featherston, Joseph and Lai, Yi-Hsiang and Liu, Gai and Velasquez, Gustavo Angarita and others, “Rosetta: A realistic benchmark suite for software programmable FP- GAs,” in Suite of Embedded Applications and Kernels Workshop , 2015

work page 2015
[28]

MLSBench: A Benchmark Set for Machine Learning based FPGA HLS Design Flows,

Goswami, Pingakshya and Shahshahani, Masoud and Bhatia, Dinesh, “MLSBench: A Benchmark Set for Machine Learning based FPGA HLS Design Flows,” in2022 IEEE 13th Latin America Symposium on Circuits and System, 2022

work page 2022
[29]

A verilog RTL synthesis tool for heterogeneous FPGAs,

Jamieson, Peter and Rose, Jonathan, “A verilog RTL synthesis tool for heterogeneous FPGAs,” in FPL. IEEE, 2005, pp. 305–310

work page 2005
[30]

Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering,

Hansen, Mark C and Yalcin, Hakan and Hayes, John , “Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering,” Design & Test of Computers, vol. 16, no. 3, pp. 72–80, 1999

work page 1999
[31]

An improved fault simulation approach based on verilog with application to ISCAS 12 benchmark circuits,

Das, Sunil R and Mukherjee, Sujoy and Petriu, Emil M and Assaf, Mansour H and Sahinoglu, Mehmet and Jone, Wen-Ben, “An improved fault simulation approach based on verilog with application to ISCAS 12 benchmark circuits,” in 2006 IEEE Instrumentation and Measurement Technology Conference Proceedings, 2006, pp. 1902–1907

work page 2006
[32]

Minimizing partial reconfiguration overhead with fully streaming DMA engines and intelligent ICAP controller,

S. Liu, R. N. Pittman, and A. Forin, “Minimizing partial reconfiguration overhead with fully streaming DMA engines and intelligent ICAP controller,” in FPGA, 2010, p. 292

work page 2010
[33]

DyRACT: A partial reconfiguration enabled accelerator and test platform,

K. Vipin and S. A. Fahmy, “DyRACT: A partial reconfiguration enabled accelerator and test platform,” in 2014 24th international conference on field programmable logic and applications (FPL). IEEE, 2014, pp. 1–7

work page 2014
[34]

ZyCAP: Efficient partial reconfiguration management on the Xilinx Zynq,

Vipin, Kizheppatt and Fahmy, Suhaib A, “ZyCAP: Efficient partial reconfiguration management on the Xilinx Zynq,” IEEE Embedded Systems Letters, vol. 6, no. 3, pp. 41–44, 2014

work page 2014
[35]

Reducing FPGA compile time with separate compilation for FPGA building blocks,

Xiao, Yuanlong and Park, Dongjoon and Butt, Andrew and Giesen, Hans and Han, Zhaoyang and Ding, Rui and Magnezi, Nevo and Rubin, Raphael and DeHon, Andr´e, “Reducing FPGA compile time with separate compilation for FPGA building blocks,” in 2019 International Conference on Field-Programmable Technology (ICFPT). IEEE, 2019, pp. 153–161

work page 2019
[36]

MiCAP: A custom reconfiguration controller for dynamic circuit specialization,

Kulkarni, Amit and Kizheppatt, Vipin and Stroobandt, Dirk, “MiCAP: A custom reconfiguration controller for dynamic circuit specialization,” in 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 2015, pp. 1–6

work page 2015
[37]

MiCAP-Pro: A high speed custom reconfiguration controller for Dynamic Circuit Specialization,

Kulkarni, Amit and Stroobandt, Dirk, “MiCAP-Pro: A high speed custom reconfiguration controller for Dynamic Circuit Specialization,” Design Automation for Embedded Systems , vol. 20, no. 4, pp. 341–359, 2016

work page 2016
[38]

A tiny and multifunctional ICAP con- troller for dynamic partial reconfiguration system,

Guohua, Wang and Dongming, Luo and Fengzhou, Wang and Adetomi, Adewale and Arslan, Tughrul, “A tiny and multifunctional ICAP con- troller for dynamic partial reconfiguration system,” in 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS) . IEEE, 2017, pp. 71–76

work page 2017
[39]

VR-ZYCAP: a versatile resource-level ICAP controller for ZYNQ SOC,

Sultana, Bushra and Ullah, Anees and Malik, Arsalan Ali and Zahir, Ali and Reviriego, Pedro and Muslim, Fahad Bin and Ullah, Nasim and Ahmad, Waleed, “VR-ZYCAP: a versatile resource-level ICAP controller for ZYNQ SOC,” Electronics, vol. 10, no. 8, p. 899, 2021

work page 2021
[40]

Fast Partial Reconfiguration, XAPP1338 (v1.0),

Xilinix Inc, “Fast Partial Reconfiguration, XAPP1338 (v1.0),” , 2019. [Online]. Available: https://docs.xilinx.com/r/en-US/ xapp1338-fast-partial-reconfiguration-pci-express/Summary

work page 2019
[41]

Cryptkeeper: Improving security with encrypted RAM,

P. A. Peterson, “Cryptkeeper: Improving security with encrypted RAM,” in 2010 IEEE International Conference on Technologies for Homeland Security (HST). IEEE, 2010, pp. 120–126

work page 2010
[42]

Memory encryption for general-purpose processors,

S. Gueron, “Memory encryption for general-purpose processors,” IEEE Security & Privacy , vol. 14, no. 6, pp. 54–62, 2016

work page 2016
[43]

An overview of DRAM-based security primitives,

Anagnostopoulos, Nikolaos Athanasios and Katzenbeisser, Stefan and Chandy, John and Tehranipoor, Fatemeh, “An overview of DRAM-based security primitives,” Cryptography, vol. 2, no. 2, p. 7, 2018

work page 2018
[44]

Isolation design flow effectiveness evaluation methodology for Zynq SoCs,

A. A. Malik, A. Ullah, A. Zahir, A. Qamar, S. K. Khattak, and P. Re- viriego, “Isolation design flow effectiveness evaluation methodology for Zynq SoCs,” Electronics, vol. 9, no. 5, p. 814, 2020

work page 2020
[45]

Enabling secure and efficient sharing of accelerators in expeditionary systems,

A. A. Malik, E. Karabulut, A. Awad, and A. Aysu, “Enabling secure and efficient sharing of accelerators in expeditionary systems,” Journal of Hardware and Systems Security , vol. 8, no. 2, pp. 94–112, 2024

work page 2024
[46]

Craft: Characterizing and root-causing fault injection threats at pre-silicon,

A. A. Malik, H. Mihir, and A. Aysu, “Craft: Characterizing and root-causing fault injection threats at pre-silicon,” arXiv preprint arXiv:2503.03877, 2025

work page arXiv 2025
[47]

Ephemeral Key-based Hybrid Hardware Obfuscation,

N. Nasir, A. Ali Malik, I. Tahir, A. Masood, and N. Riaz, “Ephemeral Key-based Hybrid Hardware Obfuscation,” in 2022 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST) , 2022, pp. 646–652. 13

work page 2022

[1] [1]

Dynamic Function eXchange, UG909 (v2023.2),

AMD Xilinix Inc, “Dynamic Function eXchange, UG909 (v2023.2),” , 2023. [Online]. Available: https://docs.xilinx.com/r/ en-US/ug909-vivado-partial-reconfiguration

work page 2023

[2] [2]

Spatiotemporal Strategies for Long-Term FPGA Resource Management,

A. Mehrabi, D. J. Sorin, and B. C. Lee, “Spatiotemporal Strategies for Long-Term FPGA Resource Management,” in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2022, pp. 198–209

work page 2022

[3] [3]

Do OS abstractions make sense on FPGAs?,

D. Korolija, T. Roscoe, and G. Alonso, “Do OS abstractions make sense on FPGAs?,” in 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) , 2020, pp. 991–1010

work page 2020

[4] [4]

Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS,

A. Khawaja, J. Landgraf, R. Prakash, M. Wei, E. Schkufza, and C. J. Rossbach, “Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS,” in USENIX Symposium on Operating Systems Design and Implementation (OSDI) , 2018, pp. 107–127

work page 2018

[5] [5]

THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs ,

E. Karabulut, A. A. Malik, A. Awad, and A. Aysu, “ THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs ,” IEEE Transactions on Computers , no. 01, pp. 1–14, May 2025. [Online]. Available: https://doi.ieeecomputersociety.org/10. 1109/TC.2025.3566874

work page arXiv 2025

[6] [6]

Context save and restore of partial reconfiguration regions for Xilinx FPGAs,

Eckert, Marcel and Meyer, Dominik and Klauer, Bernd, “Context save and restore of partial reconfiguration regions for Xilinx FPGAs,” in 2019 14th International Symposium on Reconfigurable Communication- centric Systems-on-Chip (ReCoSoC) . IEEE, 2019, pp. 5–12

work page 2019

[7] [7]

A hypervisor for shared-memory FPGA platforms,

J. Ma, G. Zuo, K. Loughlin, X. Cheng, Y . Liu, A. M. Eneyew, Z. Qi, and B. Kasikci, “A hypervisor for shared-memory FPGA platforms,” inACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASLPOS) , 2020, pp. 827–844

work page 2020

[8] [8]

Hardware Checkpointing and Productive Debugging Flows for FPGAs,

S. Attia, “Hardware Checkpointing and Productive Debugging Flows for FPGAs,” Ph.D. dissertation, University of Toronto, 2022

work page 2022

[9] [9]

Preemptive hardware multitasking in ReconOS,

M. Happe, A. Traber, and A. Keller, “Preemptive hardware multitasking in ReconOS,” in Applied Reconfigurable Computing: 11th International Symposium, ARC 2015, Bochum, Germany, April 13-17, 2015, Proceed- ings 11. Springer, 2015, pp. 79–90

work page 2015

[10] [10]

Feel free to interrupt: Safe task stopping to enable FPGA checkpointing and context switching,

S. Attia and V . Betz, “Feel free to interrupt: Safe task stopping to enable FPGA checkpointing and context switching,” ACM Transactions on Reconfigurable Technology and Systems (TRETS) , vol. 13, no. 1, pp. 1–27, 2020

work page 2020

[11] [11]

Stop and look: A novel checkpointing and debugging flow for FPGAs,

Attia, Sameh and Betz, Vaughn, “Stop and look: A novel checkpointing and debugging flow for FPGAs,” IEEE Transactions on Computers , vol. 71, no. 10, pp. 2513–2526, 2021

work page 2021

[12] [12]

7 Series FPGAs Configurable Logic Block, UG474, v1. 13.1,

UG474, Series FPGAs Configurable Logic Block, “7 Series FPGAs Configurable Logic Block, UG474, v1. 13.1,” San Jose, CA, USA , pp. 1–74, 2016

work page 2016

[13] [13]

7 Series FPGAs Configuration User Guide, UG470 (v1. 11),

UG470, Series FPGAs Configuration User Guide, “7 Series FPGAs Configuration User Guide, UG470 (v1. 11),” San Jose, CA, USA , 2016

work page 2016

[14] [14]

A hybrid approach to FPGA configuration scrubbing,

A. Stoddard, A. Gruwell, P. Zabriskie, and M. J. Wirthlin, “A hybrid approach to FPGA configuration scrubbing,” IEEE Transactions on Nuclear Science, vol. 64, no. 1, pp. 497–503, 2016

work page 2016

[15] [15]

UG947 Vivado Design Suite Tutorial Dynamic Function eXchange , Xilinx Inc, 4 2022, v2021.2

work page 2022

[16] [16]

BITMAN: A tool and API for FPGA bitstream manipulations,

K. D. Pham, E. Horta, and D. Koch, “BITMAN: A tool and API for FPGA bitstream manipulations,” inDesign, Automation & Test in Europe Conference & Exhibition (DATE), 2017 . IEEE, 2017, pp. 894–897

work page 2017

[17] [17]

Zynq-7000 All Programmable SoC Overview,

PL, Programmable Logic, “Zynq-7000 All Programmable SoC Overview,” Feb, 2012

work page 2012

[18] [18]

The RISC-V instruction set manual,

A. Waterman, Y . Lee, D. Patterson, K. Asanovic, V . I. U. level Isa, A. Waterman, Y . Lee, and D. Patterson, “The RISC-V instruction set manual,” Volume I: User-Level ISA’, version, vol. 2, 2014

work page 2014

[19] [19]

Machsuite: Benchmarks for accelerator design and customized architectures,

B. Reagen, R. Adolf, Y . S. Shao, G.-Y . Wei, and D. Brooks, “Machsuite: Benchmarks for accelerator design and customized architectures,” in IEEE International Symposium on Workload Characterization (ISWC) , 2014, pp. 110–119

work page 2014

[20] [20]

Conte, B-Con/crypto-algorithms

B. Conte, B-Con/crypto-algorithms. Conte, Brad, 12 2020. [Online]. Available: https://github.com/B-Con/crypto-algorithms

work page 2020

[21] [21]

An overview of common benchmarks,

R. P. Weicker, “An overview of common benchmarks,” Computer, vol. 23, no. 12, pp. 65–75, 1990

work page 1990

[22] [22]

Fast-fourier lattice-based compact signatures over NTRU,

Fouque, PA and Hoffstein, J and Kirchner, P and Lyubashevsky, V and Pornin, T and Prest, T and Ricosset, T and Seiler, G and Whyte, W and Zhang, Z and others, “Fast-fourier lattice-based compact signatures over NTRU,” 2019

work page 2019

[23] [23]

Vivado Design Suite Properties Reference Guide, UG912 (v2023.2),

AMD Xilinix Inc, “Vivado Design Suite Properties Reference Guide, UG912 (v2023.2),” , 2023. [Online]. Available: https://docs.xilinx.com/ r/en-US/ug912-vivado-properties/SNAPPING MODE

work page 2023

[24] [24]

Configuration Readback Capture in UltraScale FP- GAs,

Tapp, Stephanie, “Configuration Readback Capture in UltraScale FP- GAs,” Xilinx All Programmable, www. xilinx. com, XAPP1230 (v1. 1) , pp. 1–24, 2015

work page 2015

[25] [25]

Chstone: A benchmark program suite for practical C-based high-level synthesis,

Hara, Yuko and Tomiyama, Hiroyuki and Honda, Shinya and Takada, Hiroaki and Ishii, Katsuya, “Chstone: A benchmark program suite for practical C-based high-level synthesis,” in 2008 IEEE International Symposium on Circuits and Systems , 2008

work page 2008

[26] [26]

Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs,

Zhou, Yuan and Gupta, Udit and Dai, Steve and Zhao, Ritchie and Srivastava, Nitish and Jin, Hanchen and Featherston, Joseph and Lai, Yi-Hsiang and Liu, Gai and Velasquez, Gustavo Angarita and others, “Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs,” in Proceedings of the ACM/SIGDA Interna- tional Symposium on FPG...

work page 2018

[27] [27]

Rosetta: A realistic benchmark suite for software programmable FP- GAs,

Zhou, Yuan and Gupta, Udit and Dai, Steve and Zhao, Ritchie and Srivastava, Nitish and Jin, Hanchen and Featherston, Joseph and Lai, Yi-Hsiang and Liu, Gai and Velasquez, Gustavo Angarita and others, “Rosetta: A realistic benchmark suite for software programmable FP- GAs,” in Suite of Embedded Applications and Kernels Workshop , 2015

work page 2015

[28] [28]

MLSBench: A Benchmark Set for Machine Learning based FPGA HLS Design Flows,

Goswami, Pingakshya and Shahshahani, Masoud and Bhatia, Dinesh, “MLSBench: A Benchmark Set for Machine Learning based FPGA HLS Design Flows,” in2022 IEEE 13th Latin America Symposium on Circuits and System, 2022

work page 2022

[29] [29]

A verilog RTL synthesis tool for heterogeneous FPGAs,

Jamieson, Peter and Rose, Jonathan, “A verilog RTL synthesis tool for heterogeneous FPGAs,” in FPL. IEEE, 2005, pp. 305–310

work page 2005

[30] [30]

Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering,

Hansen, Mark C and Yalcin, Hakan and Hayes, John , “Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering,” Design & Test of Computers, vol. 16, no. 3, pp. 72–80, 1999

work page 1999

[31] [31]

An improved fault simulation approach based on verilog with application to ISCAS 12 benchmark circuits,

Das, Sunil R and Mukherjee, Sujoy and Petriu, Emil M and Assaf, Mansour H and Sahinoglu, Mehmet and Jone, Wen-Ben, “An improved fault simulation approach based on verilog with application to ISCAS 12 benchmark circuits,” in 2006 IEEE Instrumentation and Measurement Technology Conference Proceedings, 2006, pp. 1902–1907

work page 2006

[32] [32]

Minimizing partial reconfiguration overhead with fully streaming DMA engines and intelligent ICAP controller,

S. Liu, R. N. Pittman, and A. Forin, “Minimizing partial reconfiguration overhead with fully streaming DMA engines and intelligent ICAP controller,” in FPGA, 2010, p. 292

work page 2010

[33] [33]

DyRACT: A partial reconfiguration enabled accelerator and test platform,

K. Vipin and S. A. Fahmy, “DyRACT: A partial reconfiguration enabled accelerator and test platform,” in 2014 24th international conference on field programmable logic and applications (FPL). IEEE, 2014, pp. 1–7

work page 2014

[34] [34]

ZyCAP: Efficient partial reconfiguration management on the Xilinx Zynq,

Vipin, Kizheppatt and Fahmy, Suhaib A, “ZyCAP: Efficient partial reconfiguration management on the Xilinx Zynq,” IEEE Embedded Systems Letters, vol. 6, no. 3, pp. 41–44, 2014

work page 2014

[35] [35]

Reducing FPGA compile time with separate compilation for FPGA building blocks,

Xiao, Yuanlong and Park, Dongjoon and Butt, Andrew and Giesen, Hans and Han, Zhaoyang and Ding, Rui and Magnezi, Nevo and Rubin, Raphael and DeHon, Andr´e, “Reducing FPGA compile time with separate compilation for FPGA building blocks,” in 2019 International Conference on Field-Programmable Technology (ICFPT). IEEE, 2019, pp. 153–161

work page 2019

[36] [36]

MiCAP: A custom reconfiguration controller for dynamic circuit specialization,

Kulkarni, Amit and Kizheppatt, Vipin and Stroobandt, Dirk, “MiCAP: A custom reconfiguration controller for dynamic circuit specialization,” in 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 2015, pp. 1–6

work page 2015

[37] [37]

MiCAP-Pro: A high speed custom reconfiguration controller for Dynamic Circuit Specialization,

Kulkarni, Amit and Stroobandt, Dirk, “MiCAP-Pro: A high speed custom reconfiguration controller for Dynamic Circuit Specialization,” Design Automation for Embedded Systems , vol. 20, no. 4, pp. 341–359, 2016

work page 2016

[38] [38]

A tiny and multifunctional ICAP con- troller for dynamic partial reconfiguration system,

Guohua, Wang and Dongming, Luo and Fengzhou, Wang and Adetomi, Adewale and Arslan, Tughrul, “A tiny and multifunctional ICAP con- troller for dynamic partial reconfiguration system,” in 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS) . IEEE, 2017, pp. 71–76

work page 2017

[39] [39]

VR-ZYCAP: a versatile resource-level ICAP controller for ZYNQ SOC,

Sultana, Bushra and Ullah, Anees and Malik, Arsalan Ali and Zahir, Ali and Reviriego, Pedro and Muslim, Fahad Bin and Ullah, Nasim and Ahmad, Waleed, “VR-ZYCAP: a versatile resource-level ICAP controller for ZYNQ SOC,” Electronics, vol. 10, no. 8, p. 899, 2021

work page 2021

[40] [40]

Fast Partial Reconfiguration, XAPP1338 (v1.0),

Xilinix Inc, “Fast Partial Reconfiguration, XAPP1338 (v1.0),” , 2019. [Online]. Available: https://docs.xilinx.com/r/en-US/ xapp1338-fast-partial-reconfiguration-pci-express/Summary

work page 2019

[41] [41]

Cryptkeeper: Improving security with encrypted RAM,

P. A. Peterson, “Cryptkeeper: Improving security with encrypted RAM,” in 2010 IEEE International Conference on Technologies for Homeland Security (HST). IEEE, 2010, pp. 120–126

work page 2010

[42] [42]

Memory encryption for general-purpose processors,

S. Gueron, “Memory encryption for general-purpose processors,” IEEE Security & Privacy , vol. 14, no. 6, pp. 54–62, 2016

work page 2016

[43] [43]

An overview of DRAM-based security primitives,

Anagnostopoulos, Nikolaos Athanasios and Katzenbeisser, Stefan and Chandy, John and Tehranipoor, Fatemeh, “An overview of DRAM-based security primitives,” Cryptography, vol. 2, no. 2, p. 7, 2018

work page 2018

[44] [44]

Isolation design flow effectiveness evaluation methodology for Zynq SoCs,

A. A. Malik, A. Ullah, A. Zahir, A. Qamar, S. K. Khattak, and P. Re- viriego, “Isolation design flow effectiveness evaluation methodology for Zynq SoCs,” Electronics, vol. 9, no. 5, p. 814, 2020

work page 2020

[45] [45]

Enabling secure and efficient sharing of accelerators in expeditionary systems,

A. A. Malik, E. Karabulut, A. Awad, and A. Aysu, “Enabling secure and efficient sharing of accelerators in expeditionary systems,” Journal of Hardware and Systems Security , vol. 8, no. 2, pp. 94–112, 2024

work page 2024

[46] [46]

Craft: Characterizing and root-causing fault injection threats at pre-silicon,

A. A. Malik, H. Mihir, and A. Aysu, “Craft: Characterizing and root-causing fault injection threats at pre-silicon,” arXiv preprint arXiv:2503.03877, 2025

work page arXiv 2025

[47] [47]

Ephemeral Key-based Hybrid Hardware Obfuscation,

N. Nasir, A. Ali Malik, I. Tahir, A. Masood, and N. Riaz, “Ephemeral Key-based Hybrid Hardware Obfuscation,” in 2022 19th International Bhurban Conference on Applied Sciences and Technology (IBCAST) , 2022, pp. 646–652. 13

work page 2022