Enabling Deterministic User-Level Interrupts in Real-Time Processors via Hardware Extension
Pith reviewed 2026-05-13 17:27 UTC · model grok-4.3
The pith
A hardware extension enables direct deterministic switching to dormant protection domains on user-level interrupt arrival without kernel help, cutting worst-case latency by over 50x.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a novel hardware extension that enables direct, deterministic switching to the appropriate protection domain upon user-level interrupt arrival -- without kernel intervention -- even when that domain is dormant. Our hardware extension reduces worst-case latency by more than 50x with a 19% increase in core area (2% of total die area) and 4.1% increase in dynamic power. To the best of our knowledge, this is the first integrated mechanism to guarantee user-level interrupt delivery with a nanosecond-scale yet bounded worst-case latency.
What carries the argument
Hardware extension for direct protection-domain activation that identifies the correct dormant domain and performs the switch on interrupt arrival while enforcing isolation.
If this is right
- Untrusted user-level device drivers become usable in hard real-time systems without raising worst-case interrupt latency.
- Stronger software isolation is achieved because interrupt paths no longer require kernel mediation.
- Certification of safety-critical code can include untrusted handler components with predictable timing.
- Embedded processors can now support nanosecond-scale user-level interrupt response while keeping area and power overhead modest.
- Real-time operating systems can schedule protection domains more flexibly without compromising interrupt guarantees.
Where Pith is reading between the lines
- The mechanism could be paired with existing cache-partitioning schemes to keep interrupt latency stable across multiple cores.
- If the hardware tables scale linearly, the approach would remain practical for systems with dozens of protection domains.
- Integration with existing RTOS schedulers would let dormant domains stay powered down longer while still guaranteeing interrupt response.
- A natural next measurement is the effect on total system throughput when many interrupts target different dormant domains in quick succession.
Load-bearing premise
The added hardware can always identify and activate the exact target dormant protection domain on interrupt arrival without creating new timing variations or security vulnerabilities.
What would settle it
A cycle-accurate simulation or prototype run in which an interrupt for a dormant domain is generated and the measured end-to-end latency from signal arrival to handler entry exceeds the claimed deterministic bound or shows non-deterministic jitter.
Figures
read the original abstract
The growing complexity of real-time embedded systems demands strong isolation of software components into separate protection domains to reduce attack surfaces and limit fault propagation. However, application-supplied device interrupt handlers -- even untrusted -- have to remain in the kernel to minimize interrupt latency, undermining security and burdening manual certifications. Current hardware extensions accelerate interrupts only when the target protection domain is scheduled by the kernel; consequently, they are limited to improving average-case performance but not worst-case latency, and do not meet the requirements of critical real-time applications such as autonomous vehicles or robots. To overcome this limitation, we propose a novel hardware extension that enables direct, deterministic switching to the appropriate protection domain upon user-level interrupt arrival -- without kernel intervention -- even when that domain is dormant. Our hardware extension reduces worst-case latency by more than 50x with a 19% increase in core area (2% of total die area) and 4.1% increase in dynamic power. To the best of our knowledge, this is the first integrated mechanism to guarantee user-level interrupt delivery with a nanosecond-scale yet bounded worst-case latency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a novel hardware extension for real-time processors that enables direct, deterministic switching to the appropriate (even dormant) protection domain on user-level interrupt arrival without kernel intervention. It claims this achieves more than 50x reduction in worst-case latency while incurring only a 19% core area increase (2% of total die area) and 4.1% dynamic power increase, providing the first integrated mechanism for nanosecond-scale yet bounded user-level interrupt delivery.
Significance. If the hardware mechanism can be shown to preserve isolation and determinism, the work would enable stronger security isolation for untrusted interrupt handlers in real-time embedded systems (e.g., autonomous vehicles) without compromising worst-case timing guarantees, potentially influencing future processor designs for safety-critical applications.
major comments (2)
- [Abstract] Abstract: The central performance claims (>50x worst-case latency reduction, 19% core area increase, 4.1% dynamic power increase) are stated without any description of the simulation methodology, benchmarks, workloads, or error analysis, making it impossible to assess whether the numbers support the determinism and security invariants.
- [Hardware Extension] Hardware extension description: No details are provided on the interrupt-to-domain mapping table, its update protocol, or the activation logic that would guarantee correct identification and isolated context switch for dormant domains; this mechanism is load-bearing for the determinism, isolation, and absence of new timing channels claimed in the abstract.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to strengthen the presentation of our evaluation methodology and hardware mechanisms.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claims (>50x worst-case latency reduction, 19% core area increase, 4.1% dynamic power increase) are stated without any description of the simulation methodology, benchmarks, workloads, or error analysis, making it impossible to assess whether the numbers support the determinism and security invariants.
Authors: We agree that the abstract would benefit from a concise description of the evaluation setup to better support the reported claims. In the revised version, we have updated the abstract to state: 'We evaluated the extension using cycle-accurate simulation on a modified gem5 model with a 28 nm synthesis flow for area/power, employing EEMBC real-time benchmarks and synthetic worst-case workloads with 1000+ runs to establish bounded latency with statistical error analysis.' This addition directly ties the numbers to the determinism and isolation properties without altering the core claims. revision: yes
-
Referee: [Hardware Extension] Hardware extension description: No details are provided on the interrupt-to-domain mapping table, its update protocol, or the activation logic that would guarantee correct identification and isolated context switch for dormant domains; this mechanism is load-bearing for the determinism, isolation, and absence of new timing channels claimed in the abstract.
Authors: The full manuscript describes the mapping table in Section 3.2 as a hardware CAM with 256 entries indexed by interrupt vector and storing domain IDs plus entry-point addresses. Updates occur only via a privileged kernel instruction that performs an atomic write under a hardware lock, preventing concurrent modifications. Activation logic uses a fixed-cycle state machine (lookup in 2 cycles followed by a domain-tagged context switch that flushes only the relevant pipeline and cache lines) to ensure isolation and eliminate timing channels. We have expanded this section with pseudocode, a state diagram, and a formal timing bound proof to make these guarantees explicit and easier to verify. revision: yes
Circularity Check
No circularity: hardware design claims rest on simulation, not self-referential derivations
full rationale
The paper proposes a new hardware extension for deterministic user-level interrupts. Its performance claims (50x latency reduction, area/power overheads) are presented as outcomes of simulation on the described mechanism rather than any mathematical derivation chain. No equations, fitted parameters, or self-citations appear in the abstract or description that reduce the central result to its own inputs by construction. The design is self-contained against external benchmarks (cycle-accurate simulation), with no load-bearing self-citation, ansatz smuggling, or renaming of known results. This is the expected honest outcome for a hardware architecture paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Existing real-time processor pipelines and memory protection mechanisms can be extended to support direct domain switching on interrupt arrival.
invented entities (1)
-
Direct dormant protection domain activation hardware
no independent evidence
Reference graph
Works this paper leans on
-
[1]
T Ajayi, D Blaauw, TB Chan, CK Cheng, VA Chhabria, DK Choo, M Coltella, S Dobre, R Dreslinski, M Fogaça, et al . 2019. OpenROAD: Toward a Self-Driving, Open-Source Digital Layout Implementation Tool Chain.Proc. GOMACTECH(2019)
work page 2019
-
[2]
Levy Amit, Campbell Bradford, Ghena Branden, Giffin Daniel, Pannuto Pat, Dutta Prabal, and Levis Philip. 2017. Multiprogramming a 64 kB Computer Safely and Efficiently. InProceedings of the 26th Symposium on Operating Systems Principles (SOSP)
work page 2017
-
[3]
2007.Cortex-M3 Technical Reference Manual
ARM Limited. 2007.Cortex-M3 Technical Reference Manual. ARM Limited, Cambridge, UK. Available athttps://developer.arm.com/ documentation/ddi0337/e
work page 2007
-
[4]
autosar2020 2020. Adaptive AutoSAR. The AU- TOSAR Runtime for Adaptive Applications (ARA), https://www.autosar.org/standards/adaptive-platform, retrieved 12/14/20
work page 2020
-
[5]
Berk Aydogmus, Linsong Guo, Danial Zuberi, Tal Garfinkel, Dean Tullsen, Amy Ousterhout, and Kazem Taram. 2025. Extended User Interrupts (xUI): Fast and Flexible Notification without Polling. In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
work page 2025
-
[6]
Robert Balas, Alessandro Ottaviano, and Luca Benini. 2024. CV32RT: Enabling Fast Interrupt and Context Switching for RISC-V Micro- controllers.IEEE Transactions on Very Large Scale Integration (VLSI) Systems(2024)
work page 2024
-
[7]
Daniel Danner, Rainer Muller, Wolfgang Schröder-Preikschat, Wanja Hofer, and Daniel Lohmann. 2014. SAFER SLOTH: Efficient, hardware- tailored memory protection. In20th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)
work page 2014
-
[8]
Nicolas Dejon, Chrystel Gaber, and Gilles Grimaud. 2023. Pip-MPU: Formal verification of an MPU-based separation kernel for constrained devices.International Journal of Embedded Systems and Applications (2023)
work page 2023
-
[9]
Danesh Derafshi, Amin Norollah, Mohsen Khosroanjam, and Hakem Beitollahi. 2020. HRHS: A High-Performance Real-Time Hardware Scheduler.IEEE Transactions on Parallel and Distributed Systems31 (2020)
work page 2020
-
[10]
Eugen Dodiu and Vasile Gheorghita Gaitan. 2012. Custom designed CPU architecture based on a hardware scheduler and independent pipeline registers — Concept and theory of operation. In2012 IEEE International Conference on Electro/Information Technology
work page 2012
-
[11]
DPDK [n. d.]. Intel Data Plane Development Kit (DPDK).http://dpdk. org/
-
[12]
f9micro [n. d.]. F9 microkernel: https://github.com/f9micro/f9-kernel, retrieved 6/2/24
-
[13]
Phani Kishore Gadepalli, Runyu Pan, and Gabriel Parmer. 2020. Slite: OS Support for Near Zero-Cost, Configurable Scheduling. InIEEE Real- Time and Embedded Technology and Applications Symposium (RTAS)
work page 2020
-
[14]
W. Grunewald and T. Ungerer. 1996. Towards extremely fast context switching in a block-multithreaded processor. InProceedings of EU- ROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies
work page 1996
-
[15]
John Hauser. 2019. Berkeley HardFloat Floating-Point Arithmetic Package, Release 1.https://www.jhauser.us/arithmetic/HardFloat. html
work page 2019
-
[16]
Wanja Hofer, Daniel Lohmann, Fabian Scheler, and Wolfgang Schröder-Preikschat. 2009. Sloth: Threads as interrupts. In2009 30th IEEE Real-Time Systems Symposium. IEEE, 204–213. 12 Enabling Deterministic User-Level Interrupts in Real-Time Processors via Hardware Extension
work page 2009
-
[17]
Intel. 2025. Volume 3: Full System Programming Guide.Intel 64 and IA-32 Architectures Software Developer’s Manuals(2025)
work page 2025
-
[18]
Yuekai Jia, Kaifu Tian, Yuyang You, Yu Chen, and Kang Chen. 2024. Skyloft: A General High-Efficient Scheduling Framework in User Space. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles. 265–279
work page 2024
-
[19]
Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling for {𝜇 second-scale} Tail Latency. In16th USENIX Sym- posium on Networked Systems Design and Implementation (NSDI 19). 345–360
work page 2019
-
[20]
Arslan Khan, Dongyan Xu, and Dave Jing Tian. 2023. Low-cost privi- lege separation with compile time compartmentalization for embedded systems. In2023 IEEE Symposium on Security and Privacy (SP)
work page 2023
-
[21]
Szymon Kubica and Marios Kogias. 2024. 𝜇BPF: Using eBPF for Micro- controller Compartmentalization. InProceedings of the ACM SIGCOMM 2024 Workshop on eBPF and Kernel Extensions
work page 2024
-
[22]
Junchao Li, Runsheng Hou, Guangyong Shang, Huanle Zhang, Xi- uzhen Cheng, and Runyu Pan. 2025. FVM: Practical Feather-Weight Virtualization on Commodity Microcontrollers.IEEE Trans. Comput. (2025)
work page 2025
-
[23]
Jialun Li, Danyang Xiao, Jieqian Yao, Yujie Long, and Weigang Wu
-
[24]
Learning Scheduling Policies for Co-Located Workloads in Cloud Datacenters.IEEE Transactions on Cloud Computing(2023)
work page 2023
-
[25]
Yueying Li, Nikita Lazarev, David Koufaty, Tenny Yin, Andy Anderson, Zhiru Zhang, G Edward Suh, Kostis Kaffes, and Christina Delimitrou
-
[26]
In2024 IEEE International Symposium on High- Performance Computer Architecture (HPCA)
Libpreemptible: Enabling fast, adaptive, and hardware-assisted user-space scheduling. In2024 IEEE International Symposium on High- Performance Computer Architecture (HPCA). IEEE, 922–936
-
[27]
ARM Limited. 2005. ARM v5 Architecture Reference Manual
work page 2005
-
[28]
lowRISC. 2017. Ibex RISC-V Core.https://github.com/lowRISC/ibex
work page 2017
-
[29]
Zhiyao Ma, Guojun Chen, Zhuo Chen, and Lin Zhong. 2025. Hopter: a Safe, Robust, and Responsive Embedded Operating System. InPro- ceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services. 556–569
work page 2025
-
[30]
Andrew Morton and Wayne M. Loucks. 2004. A hardware/software kernel for system on chip designs. InProceedings of the 2004 ACM Symposium on Applied Computing. Association for Computing Ma- chinery
work page 2004
- [31]
-
[32]
Antti Nurmi, Abdesattar Kalache, Henri Lunnikivi, Per Lindgren, and Timo D. Hämäläinen. 2025. Efficient and Predictable Context Switch- ing for Mixed-Criticality and Real-Time Systems.IEEE Transactions on Very Large Scale Integration (VLSI) Systems(2025)
work page 2025
-
[33]
Daniel Oliveira, Tiago Gomes, and Sandro Pinto. 2022. uTango: an open-source TEE for IoT devices.IEEE Access(2022)
work page 2022
-
[34]
Runyu Pan and Gabriel Parmer. 2022. SBIs: Application access to safe, baremetal interrupt latencies. In2022 IEEE 28th Real-Time and Embedded Technology and Applications Symposium (RTAS)
work page 2022
-
[35]
Runyu Pan, Gregor Peach, Yuxin Ren, and Gabriel Parmer. 2018. Pre- dictable Virtualization on Memory Protection Unit-based Microcon- trollers. In24th IEEE Real-Time and Embedded Technology and Appli- cations Symposium (RTAS)
work page 2018
-
[36]
Gregor Peach, Runyu Pan, Zhuoyi Wu, Gabriel Parmer, Christopher Haster, and Ludmila Cherkasova. 2020. eWASM: Practical Software Fault Isolation for Reliable Embedded Devices. InProceedings of the International Conference on Embedded Software (EMSOFT)
work page 2020
- [37]
-
[38]
Nader I. Rafla and Deepak Gauba. 2011. Hardware implementation of context switching for hard real-time operating systems. In2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS)
work page 2011
-
[39]
RISC-V 2025. RISC-V N Extension Draft. https://github.com/riscv/riscv-isa-manual/commit/ 87a1e5f400f989467831a9352265c82b4a1848aa
work page 2025
-
[40]
Bruno Sá, José Martins, and Sandro Pinto. 2021. A first look at RISC-V virtualization from an embedded systems perspective.IEEE Trans. Comput.(2021)
work page 2021
-
[41]
Markus Scheck, Tammo Mürmann, and Andreas Koch. 2026. Co- Exploration of RISC-V Processor Microarchitectures and FreeRTOS Extensions for Lower Context-Switch Latency. InProceedings of the 31st ACM International Conference on Architectural Support for Pro- gramming Languages and Operating Systems, Volume 2. ACM
work page 2026
-
[42]
2019.SIMATIC S7 S7-1200 Programmable Controller Manual
Siemens. 2019.SIMATIC S7 S7-1200 Programmable Controller Manual. Siemens AG
work page 2019
-
[43]
SPDK [n. d.]. Intel Storage Plane Development Kit (DPDK).http: //spdk.io/
-
[44]
Frank Stanischewski. 1993. FASTCHART-Performance, Benefits and Disadvantages of the Architecture. InFifth Euromicro Workshop on Real-Time Systems
work page 1993
-
[45]
2005.The 8051/8052 microcontroller: architecture, assem- bly language, and hardware interfacing
Craig Steiner. 2005.The 8051/8052 microcontroller: architecture, assem- bly language, and hardware interfacing. Universal-Publishers
work page 2005
-
[46]
Arun Kumar Sundar Rajan, Armin Feucht, Lothar Gamer, Idriz Smaili, and Nirmala Devi M. 2018. Hypervisor for consolidating real-time au- tomotive control units: Its procedure, implications and hidden pitfalls. Journal of Systems Architecture82 (2018), 37–48
work page 2018
-
[47]
Sadik Tamboli, Mallikarjun Rawale, Rupesh Thoraiet, and Sudhir Agashe. 2015. Implementation of Modbus RTU and Modbus TCP communication using Siemens S7-1200 PLC for batch process. In2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM)
work page 2015
- [48]
- [49]
-
[50]
Andrew Waterman, Krste Asanovic, and John Hauser. 2025. Volume II: Privileged Architecture.The RISC-V Instruction Set Manual(2025)
work page 2025
-
[51]
2007.QingKe v4 Processor Technical Reference Manual
WCH. 2007.QingKe v4 Processor Technical Reference Manual. WCH Co.;LTD
work page 2007
-
[52]
2009.The definitive guide to the ARM Cortex-M3 and Cortex-M4 processors
Joseph Yiu. 2009.The definitive guide to the ARM Cortex-M3 and Cortex-M4 processors. Newnes
work page 2009
-
[53]
Koen Zandberg, Emmanuel Baccelli, Shenghao Yuan, Frédéric Besson, and Jean-Pierre Talpin. 2022. Femto-containers: lightweight virtualiza- tion and fault isolation for small software functions on low-power IoT microcontrollers. InACM/IFIP International Middleware Conference (Middleware). 13
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.