Recognition: 2 theorem links
· Lean TheoremWATSON: Leveraging Data Watchpoints for Shadow Stack Protection on Embedded Systems
Pith reviewed 2026-05-12 01:30 UTC · model grok-4.3
The pith
Data watchpoints can be configured to write-protect shadow stacks, delivering system-wide return-address defense on embedded processors with under 8 percent overhead.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WATSON configures the address-matching logic inside the data watchpoint unit to trap any write that targets the shadow-stack memory region. This hardware check prevents an attacker from overwriting stored return addresses even when the processor is handling interrupts or exceptions. The resulting protection works on ARM Cortex-M devices, adds 7.33 percent slowdown on BEEBS and 1.81 percent on CoreMark-Pro, shows negligible cost on real applications and exception paths, and increases code size by at most 2.11 percent while remaining compatible with compiler forward-edge CFI.
What carries the argument
Data watchpoint unit configured to match the shadow-stack address range and block write attempts to it.
If this is right
- Return-address overwrites aimed at the shadow stack are blocked by the hardware watchpoint trap.
- Runtime overhead stays below 8 percent on standard embedded benchmark suites.
- Exception and interrupt handling incur no measurable extra cost.
- Worst-case code-size growth remains under 2.11 percent.
- The protection composes directly with existing compiler forward-edge integrity passes.
Where Pith is reading between the lines
- Architectures that expose similar address-matching debug units could adopt the same protection pattern with little redesign.
- Developers could share watchpoint resources between security and debugging by adding a lightweight arbitration layer.
- The technique opens a route to combine shadow-stack protection with other low-level hardware monitors already present on the chip.
Load-bearing premise
That data watchpoints can be reserved exclusively for shadow-stack protection without conflicts from other debug uses or across varied embedded applications and interrupt handlers.
What would settle it
A working control-flow hijack that overwrites a return address inside the protected shadow-stack region without triggering a watchpoint fault, or a benchmark run showing the watchpoint configuration itself produces unacceptable slowdown or compatibility breakage.
Figures
read the original abstract
Embedded and Internet-of-Things (IoT) devices play a critical role in modern life. Their software and firmware, often developed in memory-unsafe languages like C, are susceptible to memory safety vulnerabilities that can lead to control-flow hijacking attacks. Shadow stack is a defense mechanism against control-flow hijacking that targets return addresses. However, existing shadow stack solutions for embedded systems have the following limitations. First, they lack system-wide protection, particularly for interrupts and exceptions. Second, they introduce high performance overhead. Third, they depend on security extensions like a trusted execution environment, which are not universally available on embedded devices. Finally, they rely on hardware features that have inherent configurable constraints, which pose compatibility challenges when integrating security mechanisms that require similar hardware support. To overcome these limitations, we present WATSON, an efficient and effective shadow stack solution. It leverages a standard hardware debug unit named data watchpoints for shadow stack protection on embedded systems. To prevent unauthorized access to the shadow stack, WATSON leverages the address-matching features of the debug unit to enforce the write protection of the shadow stack. Additionally, WATSON is compatible with compiler options to enforce forward-edge control-flow integrity. We implemented a prototype of WATSON on the ARM CortexM architecture, and the concept also applies to other platforms. The introduced overhead is 7.33% and 1.81% on BEEBS and CoreMark-Pro benchmarks, respectively. We also evaluate WATSON on exception handling and two real-world applications, observing negligible performance overhead and a worst-case code size overhead of 2.11%. Furthermore, our security evaluation demonstrates that WATSON effectively prevents attacks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents WATSON, a shadow stack mechanism for embedded systems that uses the data watchpoint (DWT) features of the ARM Cortex-M debug unit to enforce write protection on a shadow stack region via address matching. It claims system-wide coverage (including interrupts/exceptions), compatibility with compiler-enforced forward-edge CFI, low overhead (7.33% on BEEBS, 1.81% on CoreMark-Pro), negligible overhead on exception handling and two real-world applications, worst-case code-size overhead of 2.11%, and effective prevention of control-flow hijacking attacks, all without requiring TEEs or non-standard hardware extensions.
Significance. If the central claims hold, WATSON would provide a practical defense against return-address corruption on resource-constrained embedded and IoT devices that lack TEE support, using only ubiquitous debug hardware. The concrete benchmark numbers, exception-handling measurements, and attack-prevention evaluation constitute a strength; the approach avoids self-referential parameter fitting and grounds the design in standard Cortex-M DWT comparators.
major comments (2)
- [§3 and §4] §3 (Design) and §4 (Implementation): the claim of exclusive DWT configuration for shadow-stack write protection is load-bearing for both the reported overhead figures and the 'no special extensions' assertion. The manuscript does not describe how the four DWT comparators are reserved, how state is saved/restored on debugger attachment or concurrent profiling, or how the protection handler distinguishes legitimate instrumented writes (e.g., via PC range check) from unauthorized ones when interrupts or other debug features are active.
- [§5] §5 (Evaluation): the security evaluation and overhead measurements (Table 2, BEEBS/CoreMark-Pro rows) assume uncontended DWT resources. No experiment or analysis is provided for the case in which a pre-existing watchpoint or range comparator is already in use, which would either disable protection or require additional context-switch overhead, directly affecting the 'negligible on real-world applications' and 'system-wide' claims.
minor comments (2)
- [Abstract] The abstract states that 'the concept also applies to other platforms' but supplies no concrete mapping or constraints for architectures whose debug units differ in comparator count or range semantics.
- [Figure 3] Figure 3 (or equivalent architecture diagram) would benefit from an explicit call-out of the DWT comparator allocation and the exact exception vector that invokes the protection handler.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments identify important clarifications needed in the design and evaluation sections. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3 and §4] §3 (Design) and §4 (Implementation): the claim of exclusive DWT configuration for shadow-stack write protection is load-bearing for both the reported overhead figures and the 'no special extensions' assertion. The manuscript does not describe how the four DWT comparators are reserved, how state is saved/restored on debugger attachment or concurrent profiling, or how the protection handler distinguishes legitimate instrumented writes (e.g., via PC range check) from unauthorized ones when interrupts or other debug features are active.
Authors: We acknowledge the need for greater detail on DWT resource management. WATSON configures the four DWT comparators exclusively at boot for shadow-stack write protection by setting address-match comparators and enabling the corresponding watchpoint events; this is performed once in the initialization routine before any application code runs. On debugger attachment, standard Cortex-M debug behavior disables or overrides DWT settings, and WATSON restores the protection configuration upon detachment via the debug monitor exception path. The protection handler distinguishes authorized writes by checking that the faulting PC lies within the pre-defined range of instrumented call sites (a simple bounds check on the PC value read from the exception frame); unauthorized writes trigger the full protection response. We will expand §3 and §4 with these mechanisms and a diagram of the initialization sequence. revision: yes
-
Referee: [§5] §5 (Evaluation): the security evaluation and overhead measurements (Table 2, BEEBS/CoreMark-Pro rows) assume uncontended DWT resources. No experiment or analysis is provided for the case in which a pre-existing watchpoint or range comparator is already in use, which would either disable protection or require additional context-switch overhead, directly affecting the 'negligible on real-world applications' and 'system-wide' claims.
Authors: We agree that the reported numbers reflect the uncontended case, which matches the intended deployment on embedded devices where DWT is not simultaneously used for profiling. In contended scenarios, WATSON can fall back to using the remaining available comparators or disable protection entirely, incurring either reduced coverage or the overhead of a context-switch to save/restore DWT state. We will add an explicit limitations paragraph in §5 discussing these trade-offs and their impact on the 'system-wide' claim, while noting that the primary target environments (bare-metal IoT firmware without concurrent debug) remain unaffected. No new benchmark runs are feasible within the revision timeline, but the qualitative analysis will be included. revision: partial
Circularity Check
No significant circularity; design and claims rest on external hardware features and benchmark measurements
full rationale
The paper describes an implementation of shadow-stack protection via standard ARM Cortex-M data watchpoint hardware, with compatibility to existing compiler CFI options. No equations, fitted parameters, or predictions appear in the abstract or described sections. Claims of overhead (7.33% on BEEBS, 1.81% on CoreMark-Pro) and security are tied to direct prototype measurements on external benchmarks and applications rather than any self-referential reduction or self-citation chain. The central premise (using address-matching watchpoints for write protection) is justified by the documented behavior of the DWT unit, which is an independent hardware specification. No load-bearing self-citations, ansatzes, or renamings of known results are present. This is a standard engineering evaluation against external workloads.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data watchpoints in standard hardware debug units can be configured to enforce write protection on a designated shadow stack region without interfering with normal program execution or other system functions.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
WATSON leverages the address-matching features of the debug unit to enforce the write protection of the shadow stack... temporarily granting write access... during function prologues
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We implemented a prototype of WATSON on the ARM Cortex-M architecture... overhead is 7.33% and 1.81% on BEEBS and CoreMark-Pro
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
L. Szekeres, M. Payer, T. Wei, and D. Song, “Sok: Eternal war in memory,” inIEEE Symposium on Security and Privacy, 2013
work page 2013
-
[2]
The geometry of innocent flesh on the bone: Return-into- libc without function calls (on the x86),
H. Shacham, “The geometry of innocent flesh on the bone: Return-into- libc without function calls (on the x86),” inProceedings of the 14th ACM conference on Computer and communications security, 2007, pp. 552–561
work page 2007
-
[3]
Return-oriented programming without returns,
S. Checkoway, L. Davi, A. Dmitrienko, A.-R. Sadeghi, H. Shacham, and M. Winandy, “Return-oriented programming without returns,” inACM conference on Computer and communications security, 2010
work page 2010
-
[4]
Return- oriented programming without returns on arm,
L. Davi, A. Dmitrienko, A.-R. Sadeghi, and M. Winandy, “Return- oriented programming without returns on arm,” inACM SIGSAC Con- ference on Computer and Communications Security (CCS), 2010
work page 2010
-
[5]
Jump-oriented programming: a new class of code-reuse attack,
T. Bletsch, X. Jiang, V . W. Freeh, and Z. Liang, “Jump-oriented programming: a new class of code-reuse attack,” inProceedings of the 6th ACM symposium on information, computer and communications security, 2011, pp. 30–40
work page 2011
-
[6]
A. Bittau, A. Belay, A. Mashtizadeh, D. Mazieres, and D. Boneh, “Hacking blind,” inIEEE Symposium on Security and Privacy, 2014
work page 2014
-
[7]
Return- oriented programming on a cortex-m processor,
N. R. Weidler, D. Brown, S. A. Mitchel, J. Anderson, J. R. Williams, A. Costley, C. Kunz, C. Wilkinson, R. Wehbe, and R. Gerdes, “Return- oriented programming on a cortex-m processor,” inIEEE Trustcom/Big- DataSE/ICESS, 2017
work page 2017
-
[8]
uSFI: Ultra-lightweight software fault isolation for IoT-class devices,
Z. B. Aweke and T. Austin, “uSFI: Ultra-lightweight software fault isolation for IoT-class devices,” inIEEE Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018
work page 2018
-
[9]
ACES: Automatic Compartments for Embedded Systems,
A. A. Clements, N. S. Almakhdhub, S. Bagchi, and M. Payer, “ACES: Automatic Compartments for Embedded Systems,” inUSENIX Security Symposium, 2018
work page 2018
-
[10]
Se- curing Real-Time Microcontroller Systems through Customized Memory View Switching,
C. H. Kim, T. Kim, H. Choi, Z. Gu, B. Lee, X. Zhang, and D. Xu, “Se- curing Real-Time Microcontroller Systems through Customized Memory View Switching,” inNetwork and Distributed System Security Sympo- sium (NDSS), 2018
work page 2018
-
[11]
D-Box: DMA- enabled Compartmentalization for Embedded Applications,
A. Mera, Y . H. Chen, R. Sun, E. Kirda, and L. Lu, “D-Box: DMA- enabled Compartmentalization for Embedded Applications,” inNetwork and Distributed System Security Symposium (NDSS), 2022
work page 2022
-
[12]
HARM: Hardware-assisted continuous re-randomization for microcontrollers,
J. Shi, L. Guan, W. Li, D. Zhang, P. Chen, and N. Zhang, “HARM: Hardware-assisted continuous re-randomization for microcontrollers,” in European Symposium on Security and Privacy (EuroS&P). IEEE, 2022
work page 2022
-
[13]
OPEC: operation-based security isolation for bare-metal embedded systems,
X. Zhou, J. Li, W. Zhang, Y . Zhou, W. Shen, and K. Ren, “OPEC: operation-based security isolation for bare-metal embedded systems,” in European Conference on Computer Systems, 2022
work page 2022
-
[14]
Ec: Embedded systems compartmental- ization via intra-kernel isolation,
A. Khan, D. Xu, and D. Tian, “Ec: Embedded systems compartmental- ization via intra-kernel isolation,” inSymposium on Security and Privacy (SP). IEEE Computer Society, 2023
work page 2023
-
[15]
Low-cost privilege separation with compile time compartmentalization for embedded systems,
A. Khan, D. Xu, and D. J. Tian, “Low-cost privilege separation with compile time compartmentalization for embedded systems,” inSympo- sium on Security and Privacy (S&P). IEEE, 2023
work page 2023
-
[16]
Safer Sloth: Efficient, hardware-tailored memory protec- tion,
D. Danner, R. M ¨uller, W. Schr ¨oder-Preikschat, W. Hofer, and D. Lohmann, “Safer Sloth: Efficient, hardware-tailored memory protec- tion,” inIEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014
work page 2014
-
[17]
Protecting bare-metal embedded systems with privilege overlays,
A. A. Clements, N. S. Almakhdhub, K. S. Saab, P. Srivastava, J. Koo, S. Bagchi, and M. Payer, “Protecting bare-metal embedded systems with privilege overlays,” inIEEE Symposium on Security and Privacy (S&P), 2017
work page 2017
-
[18]
Is the Canary Dead? On the Effectiveness of Stack Canaries on Microcontroller Systems,
X. Tan, S. Mohan, M. Armanuzzaman, Z. Ma, G. Liu, A. Eastman, H. Hu, and Z. Zhao, “Is the Canary Dead? On the Effectiveness of Stack Canaries on Microcontroller Systems,” inACM/SIGAPP Symposium on Applied Computing (SAC), 2024
work page 2024
-
[19]
Return-to-non- secure vulnerabilities on arm cortex-m trustzone: Attack and defense,
Z. Ma, X. Tan, L. Ziarek, N. Zhang, H. Hu, and Z. Zhao, “Return-to-non- secure vulnerabilities on arm cortex-m trustzone: Attack and defense,” inACM/IEEE Design Automation Conference (DAC). IEEE, 2023
work page 2023
-
[20]
CFI CaRE: Hardware- supported call and return enforcement for commercial microcontrollers,
T. Nyman, J.-E. Ekberg, L. Davi, and N. Asokan, “CFI CaRE: Hardware- supported call and return enforcement for commercial microcontrollers,” inInternational Symposium on Research in Attacks, Intrusions, and Defenses (RAID), 2017
work page 2017
-
[21]
TZmCFI: RTOS- Aware Control-Flow Integrity Using TrustZone for Armv8-M,
T. Kawada, S. Honda, Y . Matsubara, and H. Takada, “TZmCFI: RTOS- Aware Control-Flow Integrity Using TrustZone for Armv8-M,” 2020
work page 2020
-
[22]
RIO: Return Instruction Obfuscation for Bare-metal IoT Devices,
B. Kim, K. Lee, W. Park, J. Cho, and B. Lee, “RIO: Return Instruction Obfuscation for Bare-metal IoT Devices,”IEEE Access, 2023
work page 2023
-
[23]
Trusted Execution En- vironments in Embedded and IoT Systems: A CactiLab Perspective,
Z. Zhao, M. Armanuzzaman, X. Tan, and Z. Ma, “Trusted Execution En- vironments in Embedded and IoT Systems: A CactiLab Perspective,” in International Symposium on Secure and Private Execution Environment Design (SEED). IEEE, 2024
work page 2024
-
[24]
Silhouette: Efficient protected shadow stacks for embedded systems,
J. Zhou, Y . Du, Z. Shen, L. Ma, J. Criswell, and R. J. Walls, “Silhouette: Efficient protected shadow stacks for embedded systems,” inUSENIX Security Symposium, 2020
work page 2020
-
[25]
Holistic Control-Flow Protection on Real-Time Embedded Systems with Kage,
Y . Du, Z. Shen, K. Dharsee, J. Zhou, R. J. Walls, and J. Criswell, “Holistic Control-Flow Protection on Real-Time Embedded Systems with Kage,” inUSENIX Security Symposium, 2022
work page 2022
-
[26]
Control-flow integrity for real-time embedded systems,
R. J. Walls, N. F. Brown, T. Le Baron, C. A. Shue, H. Okhravi, and B. C. Ward, “Control-flow integrity for real-time embedded systems,” inEuromicro Conference on Real-Time Systems (ECRTS), 2019
work page 2019
-
[27]
DeTRAP: RISC-V Return Address Protection With Debug Triggers,
I. Richter, J. Zhou, and J. Criswell, “DeTRAP: RISC-V Return Address Protection With Debug Triggers,”IEEE SecDev, 2024
work page 2024
-
[28]
fASLR: Function- based ASLR for resource-constrained IoT systems,
X. Shao, L. Luo, Z. Ling, H. Yan, Y . Wei, and X. Fu, “fASLR: Function- based ASLR for resource-constrained IoT systems,” inEuropean Sym- posium on Research in Computer Security. Springer, 2022
work page 2022
-
[29]
SHERLOC: Secure and Holistic Control-Flow Vi- olation Detection on Embedded Systems,
X. Tan and Z. Zhao, “SHERLOC: Secure and Holistic Control-Flow Vi- olation Detection on Embedded Systems,” inACM SIGSAC Conference on Computer and Communications Security (CCS), 2023
work page 2023
-
[30]
InsectACIDE: Debugger-based holistic asynchronous CFI for embedded system,
Y . Wang, C. L. Mack, X. Tan, N. Zhang, Z. Zhao, S. Baruah, and B. C. Ward, “InsectACIDE: Debugger-based holistic asynchronous CFI for embedded system,” inReal-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 2024
work page 2024
-
[31]
X. Tan, Z. Ma, S. Pinto, L. Guan, N. Zhang, J. Xu, Z. Lin, H. Hu, and Z. Zhao, “SoK: Where’s the “up”?! A Comprehensive (bottom-up) Study on the Security of Arm Cortex-M Systems,” inUSENIX WOOT Conference on Offensive Technologies, 2024
work page 2024
-
[32]
Trusted execution envi- ronment: What it is, and what it is not,
M. Sabt, M. Achemlal, and A. Bouabdallah, “Trusted execution envi- ronment: What it is, and what it is not,” inTrustcom/BigDataSE/Ispa. IEEE, 2015
work page 2015
-
[33]
Armv8-M Meory Protection Unit Version 1.0,
Arm, “Armv8-M Meory Protection Unit Version 1.0,” https://developer. arm.com/documentation/100699/0100/Preface/About-this-book?lang=e n, 2024
work page 2024
-
[34]
Armv7-M Architecture Reference Manual,
——, “Armv7-M Architecture Reference Manual,” https://developer.ar m.com/documentation/ddi0403/ed, 2024
work page 2024
-
[35]
Armv8-M Architecture Reference Manual,
——, “Armv8-M Architecture Reference Manual,” https://developer.ar m.com/documentation/ddi0553/bm/, 2015
work page 2015
-
[36]
R.-V . D. Subcommittee, “RISC-V Debug Support,” https://github.com/r iscv/riscv-debug-spec, 2024
work page 2024
-
[37]
L. Documentation, “LLVM Control Flow Integrity,” https://clang.llvm .org/docs/ControlFlowIntegrity.html, 2024
work page 2024
-
[38]
Enforcing forward-edge control-flow integrity in GCC and LLVM,
C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, ´U. Erlingsson, L. Lozano, and G. Pike, “Enforcing forward-edge control-flow integrity in GCC and LLVM,” inUSENIX Security Symposium, 2014
work page 2014
-
[39]
BEEBS: open benchmarks for energy measurements on embedded platforms,
J. Pallister, S. Hollis, and J. Bennett, “BEEBS: open benchmarks for energy measurements on embedded platforms,” 2013
work page 2013
-
[40]
CoreMark-Pro, “An EEMBC benchmark,” https://www.eembc.org/core mark-pro/, 2021
work page 2021
-
[41]
SoK: Shining light on shadow stacks,
N. Burow, X. Zhang, and M. Payer, “SoK: Shining light on shadow stacks,” inIEEE Symposium on Security and Privacy (S&P), 2019
work page 2019
-
[42]
The LLVM Compiler Infrastructure,
“The LLVM Compiler Infrastructure,” https://llvm.org/, 2024
work page 2024
-
[43]
Discovery kit with stm32f469ni mcu: datasheet,
STMicroelectronics, “Discovery kit with stm32f469ni mcu: datasheet,” https://www.st.com/resource/en/data brief/32f469idiscovery.pdf, 2024
work page 2024
-
[44]
Control-flow integrity principles, implementations, and applications,
M. Abadi, M. Budiu, ´U. Erlingsson, and J. Ligatti, “Control-flow integrity principles, implementations, and applications,” inACM SIGSAC Conference on Computer and Communications Security (CCS), 2005
work page 2005
-
[45]
Control-flow integrity principles, implementations, and applica- tions,
——, “Control-flow integrity principles, implementations, and applica- tions,” 2009
work page 2009
-
[46]
STMicroelectronics, “Stm32electronics website,” https://www.st.com/c ontent/st com/en.html, 2026
work page 2026
-
[47]
Arm, “Cortex-M4 instructions,” https://developer.arm.com/documentat ion/ddi0439/b/Programmers-Model/Instruction-set-summary/Cortex-M 4-instructions, 2024
work page 2024
-
[48]
µRAI: Securing Embedded Systems with Return Address Integrity,
N. S. Almakhdhub, A. A. Clements, S. Bagchi, and M. Payer, “µRAI: Securing Embedded Systems with Return Address Integrity,” inNetwork and Distributed Systems Security (NDSS) Symposium, 2020
work page 2020
-
[49]
SuM: Efficient shadow stack protection on ARM Cortex-M,
W. Choi, M. Seo, S. Lee, and B. B. Kang, “SuM: Efficient shadow stack protection on ARM Cortex-M,”Computers & Security, 2024
work page 2024
-
[50]
Fast Execute-Only Memory for Embedded Systems,
Z. Shen, K. Dharsee, and J. Criswell, “Fast Execute-Only Memory for Embedded Systems,” inIEEE Secure Development (SecDev), 2020
work page 2020
-
[51]
In-process memory isolation using hardware watchpoint,
J. Jang and B. B. Kang, “In-process memory isolation using hardware watchpoint,” inProceedings of the 56th Annual Design Automation Conference (DAC), 2019
work page 2019
-
[52]
Fortification of function calls,
G. C. Library, “Fortification of function calls,” https://www.gnu.org/so ftware/libc/manual/html node/Source-Fortification.html, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.