WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

Igor Santos-Grueiro

arxiv: 2606.11871 · v1 · pith:PNRKFIWUnew · submitted 2026-06-10 · 💻 cs.CR

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

Igor Santos-Grueiro This is my paper

Pith reviewed 2026-06-27 09:06 UTC · model grok-4.3

classification 💻 cs.CR

keywords CUDA SASScontrol-flow integrityGPU binary protectiondevice-side CFIprotected sitesbackward edgeforward edgeSASS recovery

0 comments

The pith

WarpGuard enforces control-flow integrity on executed NVIDIA SASS binaries at protected sites using recovered instructions for policy derivation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WarpGuard, a system for applying control-flow integrity to CUDA device binaries at the SASS level. It identifies sites where SASS instructions consume control-flow state, recovers those instructions or sequences to derive policies from binary evidence alone, performs checks before releasing transfers, and fails closed if violations occur. This targets the security boundary after all compilation steps including PTX lowering and inlining, which source or PTX policies miss. A sympathetic reader would care because it enables protection for deployed binaries against attacks that corrupt return continuations or branch targets in GPU kernels.

Core claim

WarpGuard is the first protected-site CFI system for CUDA device binaries operating on executed SASS. It enforces at protected sites: recovered SASS instructions or sequences that consume control-flow state, provide sufficient binary evidence to derive policy, are checked before release, and fail closed on violation. It authenticates backward-edge continuation state for instrumented returns, validates recoverable forward targets per site, and reports fixed-edge, unsupported, profile-excluded, fallback, and no-surface outcomes outside the protected denominator.

What carries the argument

Protected-site enforcement on SASS control-flow consumption sites that recover instructions to derive and check policies from binary evidence.

If this is right

It classifies 51,621 SASS control-flow sites across 77 CUDA artifacts, including 1,343 returns and 154 supported forward target-set entries.
It records 52.2 million dynamic checks in tests.
In representative attacks, native execution allows attacker behavior, detect-only records violations, and enforcement fails closed before releasing invalid transfers.
Public evidence shows the SASS patterns occur in real systems including runtime dispatch tables and generated callable tables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar protected-site approaches might apply to other low-level GPU or accelerator binaries where source-level protections fall short.
The separation of dynamic instrumentation from callback-free enforcement could reduce overhead in high-performance computing environments.
Binary analysis tools for CUDA might incorporate these recovery techniques to audit control-flow surfaces without runtime support.

Load-bearing premise

That sufficient binary evidence exists at SASS consumption sites to derive a sound policy without requiring source, PTX, or runtime callbacks.

What would settle it

A deployed CUDA system where a control-flow corruption attack succeeds on an instrumented binary because the SASS site lacks enough evidence to derive or enforce the policy.

Figures

Figures reproduced from arXiv: 2606.11871 by Igor Santos-Grueiro.

**Figure 1.** Figure 1: WARPGUARD architecture. Host-side analysis recovers SASS sites and binds policy to the loaded image. Check placement is backend-specific: WG-NVBit places reference helper checks, WG-ST places matched timing patches, and WG-PC loads verified patch-cache entries for supported sm_89 surfaces. Runtime checks enforce supported sites; unsupported and profile-excluded sites are audited and excluded from protected… view at source ↗

read the original abstract

Recent CUDA exploitation work shows that GPU memory bugs can escalate into device-side control-flow corruption, as kernels later consume corrupted return continuations, function pointers, dispatch-table entries, or branch targets. For deployed CUDA binaries, the relevant security boundary is executed NVIDIA SASS, after PTX lowering, inlining, ABI decisions, register allocation, spills, predication, and SIMT execution; source- or PTX-level policies do not capture this boundary. We present WarpGuard, to our knowledge the first protected-site CFI system for CUDA device binaries operating on executed SASS. WarpGuard enforces at protected sites: recovered SASS instructions or sequences that consume control-flow state, provide sufficient binary evidence to derive policy, are checked before release, and fail closed on violation. It authenticates backward-edge continuation state for instrumented returns, validates recoverable forward targets per site, and reports fixed-edge, unsupported, profile-excluded, fallback, and no-surface outcomes outside the protected denominator. On 77 CUDA artifacts, WarpGuard classifies 51,621 SASS control-flow sites, including 1,343 returns and 154 supported forward target-set entries, and records 52.2 million dynamic checks. In representative backward- and forward-edge corruption attacks, native execution reaches attacker-selected behavior, detect-only mode records the expected violation, and enforcement fails closed before releasing the invalid protected transfer. Public-code evidence shows that the same SASS consumption patterns occur in real CUDA systems, including runtime dispatch tables, cuFFT callbacks, generated callable tables, and uploaded device-function pointers. WarpGuard delivers auditable protected-site CFI for CUDA SASS and separates dynamic-instrumentation enforcement from callback-free SASS timing and patch-cache feasibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WarpGuard is the first CFI at executed CUDA SASS level, with attack demos that work, but the key assumption on static evidence at every site needs close checking.

read the letter

The main takeaway is that WarpGuard claims to be the first protected-site CFI that runs on executed CUDA SASS binaries after PTX lowering, inlining, register allocation, and predication. That level is the right security boundary for deployed code, and prior CFI work stayed at source or PTX.

The paper does a solid job on the problem statement and the concrete results. It classifies 51,621 sites across 77 artifacts, including 1,343 returns, runs 52.2 million dynamic checks, and shows enforcement that fails closed on both backward-edge and forward-edge corruption attacks. They also tie the protected patterns back to real CUDA code like dispatch tables, cuFFT callbacks, and callable tables, which strengthens the motivation.

The soft spot is the central assumption that SASS consumption sites always supply enough static evidence to derive a sound, complete policy without source or PTX. The stress-test point about runtime dispatch tables or generated callables potentially lacking unambiguous evidence is worth verifying in the full text. With only 154 supported forward target-set entries, it is not obvious how representative the evaluated sites are of cases where evidence could be thin, which might push the system toward over-approximation or fallbacks. The implementation details on site recovery and policy validation would need to be examined to confirm the claims hold.

This is for researchers working on GPU security or binary-level CFI. A reader in that niche would get value from the approach and the scale of the evaluation. It deserves a serious referee because it targets a real escalation path with a construction at the correct abstraction, even if some evaluation details on edge cases could be tighter.

Referee Report

3 major / 2 minor

Summary. The paper presents WarpGuard as the first protected-site CFI system for executed NVIDIA SASS binaries. It recovers SASS instructions or sequences at control-flow consumption sites, derives policies from binary evidence for backward-edge continuations and forward targets, performs checks before release, and fails closed on violation. Evaluation on 77 CUDA artifacts classifies 51,621 SASS sites (1,343 returns, 154 supported forward entries) with 52.2 million dynamic checks; attacks show native execution reaches attacker behavior while enforcement detects and blocks invalid transfers. Public-code evidence is cited for patterns in runtime dispatch tables, cuFFT callbacks, and callable tables.

Significance. If the central claims hold, WarpGuard fills a documented gap by operating after PTX lowering, inlining, register allocation, predication, and SIMT execution, where source/PTX policies are insufficient. The scale of the evaluation (77 artifacts, 51k+ sites, 52M checks) and explicit handling of fixed-edge/unsupported/fallback outcomes provide concrete evidence of practicality. The separation of enforcement from callback-free SASS timing is a clear strength.

major comments (3)

[Evaluation] Evaluation section: the reported classification of 51,621 sites and 52.2 million checks supplies no implementation details, error bars, or verification that recovered SASS sites actually match the claimed policy derivation from binary evidence alone.
[Policy Derivation] Policy derivation and protected-site definition: the claim that SASS consumption sites always yield sufficient static evidence for sound, complete policy (both 1,343 returns and 154 forward targets) is load-bearing, yet the evaluation on 77 artifacts does not demonstrate coverage for runtime dispatch tables, generated callable tables, or cuFFT callbacks where PTX lowering and inlining may leave ambiguous evidence.
[Attack Evaluation] Attack evaluation: while native execution reaches attacker-selected behavior and enforcement fails closed, the paper does not quantify how often fallback or profile-excluded paths are taken, which directly affects the soundness claim for the protected denominator.

minor comments (2)

[Abstract] Abstract: the phrase 'provide sufficient binary evidence to derive policy' is used without a precise definition or decision procedure that could be checked against the 51,621 sites.
[Terminology] Terminology: 'fail closed' and 'no-surface outcomes' are introduced but lack a short formal statement of the exact failure semantics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the significance of operating CFI at the SASS boundary. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the reported classification of 51,621 sites and 52.2 million checks supplies no implementation details, error bars, or verification that recovered SASS sites actually match the claimed policy derivation from binary evidence alone.

Authors: We agree that the evaluation section would benefit from additional implementation details. In the revised manuscript we will describe the SASS recovery and policy-derivation pipeline (including the binary analysis pass and how evidence is extracted at consumption sites), provide representative SASS snippets with the corresponding derived policies, and clarify that the reported figures are deterministic static counts rather than sampled measurements, rendering error bars inapplicable. These additions will directly verify that recovered sites match the binary-evidence claim. revision: yes
Referee: [Policy Derivation] Policy derivation and protected-site definition: the claim that SASS consumption sites always yield sufficient static evidence for sound, complete policy (both 1,343 returns and 154 forward targets) is load-bearing, yet the evaluation on 77 artifacts does not demonstrate coverage for runtime dispatch tables, generated callable tables, or cuFFT callbacks where PTX lowering and inlining may leave ambiguous evidence.

Authors: The protected-site definition explicitly excludes sites lacking sufficient evidence (reporting them as unsupported or fallback). The 77 artifacts encompass production CUDA libraries that contain the cited patterns; public-code references already illustrate dispatch tables, cuFFT callbacks, and callable tables. To strengthen the presentation we will add a short subsection with concrete SASS excerpts from these categories, showing which sites were classified as protected versus unsupported. This constitutes a partial revision: the core soundness argument for the protected denominator remains unchanged, but explicit coverage examples will be supplied. revision: partial
Referee: [Attack Evaluation] Attack evaluation: while native execution reaches attacker-selected behavior and enforcement fails closed, the paper does not quantify how often fallback or profile-excluded paths are taken, which directly affects the soundness claim for the protected denominator.

Authors: The evaluation already records fixed-edge, unsupported, profile-excluded, fallback, and no-surface outcomes for every site. We will augment the attack-evaluation section with a breakdown (table or text) reporting the observed frequencies of each category across the 51,621 sites. This will make explicit the size of the protected denominator and the fraction of paths that fall outside it, directly addressing the soundness concern. revision: yes

Circularity Check

0 steps flagged

No circularity; novel construction without reductions to fitted inputs or self-citations

full rationale

The paper presents WarpGuard as an engineering construction for protected-site CFI on executed NVIDIA SASS binaries. No equations, fitted parameters, predictions, or first-principles derivations appear that could reduce to inputs by construction. The core mechanism (recovering SASS sequences at consumption sites to derive and enforce policy) is described as a new system rather than a renaming or self-referential fit. Evaluation on 77 artifacts and 51,621 sites is empirical reporting, not a statistical prediction forced by prior fits. No load-bearing self-citations, uniqueness theorems from prior author work, or smuggled ansatzes are invoked. The derivation chain is self-contained as a systems artifact.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract introduces no free parameters, mathematical axioms, or new postulated entities; it relies on standard binary analysis assumptions about recoverability of control-flow sites.

pith-pipeline@v0.9.1-grok · 5841 in / 1127 out tokens · 24780 ms · 2026-06-27T09:06:41.292286+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 2 canonical work pages

[1]

Buffer overflow vulnerabilities in CUDA: a preliminary analysis,

A. Miele, “Buffer overflow vulnerabilities in CUDA: a preliminary analysis,”J. Comput. Virol. Hacking Tech., vol. 12, no. 2, pp. 113– 120, 2016

2016
[2]

A study of overflow vulnerabilities on gpus,

B. Di, J. Sun, and H. Chen, “A study of overflow vulnerabilities on gpus,” inProceedings of the 13th IFIP WG 10.3 International Conference on Network and Parallel Computing (NPC), ser. Lecture Notes in Computer Science, 2016, pp. 103–115

2016
[3]

GPU memory exploitation for fun and profit,

Y . Guo, Z. Zhang, and J. Yang, “GPU memory exploitation for fun and profit,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security). USENIX Association, 2024, pp. 4033–4050

2024
[4]

Cuda, woulda, shoulda: Returning exploits in a sass-y world,

J. Roels, A. Jacobs, and S. V olckaert, “Cuda, woulda, shoulda: Returning exploits in a sass-y world,” inProceedings of the 18th European Workshop on Systems Security (EuroSec). ACM, 2025, pp. 40–48

2025
[5]

Control-flow integrity,

M. Abadi, M. Budiu, Ú. Erlingsson, and J. Ligatti, “Control-flow integrity,” inProceedings of the 12th ACM Conference on Computer and Communications Security (CCS). ACM, 2005, pp. 340–353

2005
[6]

Control-flow integrity: Precision, security, and perfor- mance,

N. Burow, S. A. Carr, J. Nash, P. Larsen, M. Franz, S. Brunthaler, and M. Payer, “Control-flow integrity: Precision, security, and perfor- mance,”ACM Comput. Surv., vol. 50, no. 1, pp. 16:1–16:33, 2017

2017
[7]

Stitching the gad- gets: On the ineffectiveness of coarse-grained control-flow integrity protection,

L. Davi, A. Sadeghi, D. Lehmann, and F. Monrose, “Stitching the gad- gets: On the ineffectiveness of coarse-grained control-flow integrity protection,” inProceedings of the 23rd USENIX Security Symposium (USENIX Security). USENIX Association, 2014, pp. 401–416

2014
[8]

Control-flow bending: On the effectiveness of control-flow integrity,

N. Carlini, A. Barresi, M. Payer, D. A. Wagner, and T. R. Gross, “Control-flow bending: On the effectiveness of control-flow integrity,” inProceedings of the 24th USENIX Security Symposium (USENIX Security). USENIX Association, 2015, pp. 161–176

2015
[9]

Control jujutsu: On the weaknesses of fine-grained control flow integrity,

I. Evans, F. Long, U. Otgonbaatar, H. E. Shrobe, M. C. Rinard, H. Okhravi, and S. Sidiroglou-Douskos, “Control jujutsu: On the weaknesses of fine-grained control flow integrity,” inProceedings of the 22nd ACM SIGSAC Conference on Computer and Communica- tions Security (CCS). ACM, 2015, pp. 901–913

2015
[10]

CUDA Compiler Driver NVCC,

NVIDIA, “CUDA Compiler Driver NVCC,” https://docs.nvidia.com/ cuda/cuda-compiler-driver-nvcc/, 2026, accessed 2026-05-14

2026
[11]

Parallel Thread Execution ISA,

——, “Parallel Thread Execution ISA,” https://docs.nvidia.com/cuda/ parallel-thread-execution/, 2026, accessed 2026-05-12

2026
[12]

CUDA Binary Utilities,

——, “CUDA Binary Utilities,” https://docs.nvidia.com/cuda/ cuda-binary-utilities/, 2026, accessed 2026-05-12

2026
[13]

CUDA C++ Programming Guide,

——, “CUDA C++ Programming Guide,” https://docs.nvidia.com/ cuda/cuda-c-programming-guide/, 2026, accessed 2026-05-12

2026
[14]

Control flow management in modern gpus,

M. A. Shoushtary, J. T. Murgadas, and A. González, “Control flow management in modern gpus,”CoRR, vol. abs/2407.02944, 2024

work page arXiv 2024
[15]

PPL-CUDA-SMC,

PPL-CUDA-SMC Contributors, “PPL-CUDA-SMC,” https://github. com/JoeyOhman/PPL-CUDA-SMC, 2026, gitHub repository; ac- cessed 2026-05-23

2026
[16]

Pacific Northwest National Laboratory, “SV-Sim,” https://github.com/ pnnl/SV-Sim, 2026, gitHub repository; accessed 2026-05-23

2026
[17]

——, “DM-Sim,” https://github.com/pnnl/DM-Sim, 2026, gitHub repository; accessed 2026-05-23

2026
[18]

Demystifying and Exploiting ASLR on NVIDIA GPUs,

R. Zhu, G. Chen, W. Shen, L. Zhang, D. Shen, R. Chang, and Y . Guo, “Demystifying and Exploiting ASLR on NVIDIA GPUs,” inProceedings of the 47th IEEE Symposium on Security and Privacy (S&P). IEEE, 2026

2026
[19]

GHost in the SHELL: A GPU-to-Host Memory Attack and Its Mitigation,

S. Roh, W. Choi, J. Chung, Y . Lee, S. Song, and B. Lee, “GHost in the SHELL: A GPU-to-Host Memory Attack and Its Mitigation,” in Proceedings of the 47th IEEE Symposium on Security and Privacy (S&P). IEEE, 2026

2026
[20]

Practical control flow integrity and random- ization for binary executables,

C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant, D. Song, and W. Zou, “Practical control flow integrity and random- ization for binary executables,” inProceedings of the 34th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2013, pp. 559–573

2013
[21]

Control flow integrity for COTS binaries,

M. Zhang and R. Sekar, “Control flow integrity for COTS binaries,” inProceedings of the 22nd USENIX Security Symposium (USENIX Security). USENIX Association, 2013, pp. 337–352

2013
[22]

Securing GPU via region-based bounds checking,

J. Lee, Y . Kim, J. Cao, E. Kim, J. Lee, and H. Kim, “Securing GPU via region-based bounds checking,” inProceedings of the 49th Annual International Symposium on Computer Architecture (ISCA). ACM, 2022, pp. 27–41

2022
[23]

Guardian: Safe GPU sharing in multi-tenant environments,

M. Pavlidakis, G. Vasiliadis, S. Mavridis, A. Argyros, A. Chaz- apis, and A. Bilas, “Guardian: Safe GPU sharing in multi-tenant environments,” inProceedings of the 25th International Middleware Conference (Middleware). ACM, 2024, pp. 313–326

2024
[24]

Gpuarmor: A hardware-software co-design for efficient and scalable memory safety on gpus,

M. T. I. Ziad, S. Damani, M. Stephenson, S. W. Keckler, and A. Jaleel, “Gpuarmor: A hardware-software co-design for efficient and scalable memory safety on gpus,”CoRR, vol. abs/2502.17780, 2025

work page arXiv 2025
[25]

Nvbit: A dynamic binary instrumentation framework for NVIDIA gpus,

O. Villa, M. Stephenson, D. W. Nellans, and S. W. Keckler, “Nvbit: A dynamic binary instrumentation framework for NVIDIA gpus,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, 2019, pp. 372–383

2019
[26]

The geometry of innocent flesh on the bone: return- into-libc without function calls (on the x86),

H. Shacham, “The geometry of innocent flesh on the bone: return- into-libc without function calls (on the x86),” inProceedings of the 14th ACM Conference on Computer and Communications Security (CCS). ACM, 2007, pp. 552–561

2007
[27]

Return- oriented programming: Systems, languages, and applications,

R. Roemer, E. Buchanan, H. Shacham, and S. Savage, “Return- oriented programming: Systems, languages, and applications,”ACM Trans. Inf. Syst. Secur ., vol. 15, no. 1, pp. 2:1–2:34, 2012

2012
[28]

Return-oriented programming without returns,

S. Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, H. Shacham, and M. Winandy, “Return-oriented programming without returns,” in Proceedings of the 17th ACM Conference on Computer and Commu- nications Security (CCS). ACM, 2010, pp. 559–572

2010
[29]

Jump-oriented programming: a new class of code-reuse attack,

T. K. Bletsch, X. Jiang, V . W. Freeh, and Z. Liang, “Jump-oriented programming: a new class of code-reuse attack,” inProceedings of the 6th ACM Symposium on Information, Computer and Communications Security (ASIACCS). ACM, 2011, pp. 30–40

2011
[30]

Counterfeit object-oriented programming: On the difficulty of preventing code reuse attacks in C++ applications,

F. Schuster, T. Tendyck, C. Liebchen, L. Davi, A. Sadeghi, and T. Holz, “Counterfeit object-oriented programming: On the difficulty of preventing code reuse attacks in C++ applications,” inProceedings of the 36th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2015, pp. 745–762

2015
[31]

Enforcing forward-edge control-flow integrity in GCC & LLVM,

C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, Ú. Erlings- son, L. Lozano, and G. Pike, “Enforcing forward-edge control-flow integrity in GCC & LLVM,” inProceedings of the 23rd USENIX Security Symposium (USENIX Security). USENIX Association, 2014, pp. 941–955

2014
[32]

Control Flow Integrity,

LLVM Project, “Control Flow Integrity,” https://clang.llvm.org/docs/ ControlFlowIntegrity.html, 2026, accessed 2026-05-12

2026
[33]

ShadowCallStack,

——, “ShadowCallStack,” https://clang.llvm.org/docs/ ShadowCallStack.html, 2026, accessed 2026-05-14

2026
[34]

/guard: Enable Control Flow Guard,

Microsoft, “/guard: Enable Control Flow Guard,” https://learn. microsoft.com/cpp/build/reference/guard-enable-control-flow-guard, 2025, accessed 2026-05-14

2025
[35]

A Technical Look at Intel Control-Flow Enforcement Technol- ogy,

Intel, “A Technical Look at Intel Control-Flow Enforcement Technol- ogy,” https://www.intel.com/content/www/us/en/developer/articles/ technical/technical-look-control-flow-enforcement-technology.html, 2020, accessed 2026-05-14

2020
[36]

Improving Control Flow Integrity with Pointer Authen- tication,

Apple, “Improving Control Flow Integrity with Pointer Authen- tication,” https://developer.apple.com/documentation/apple-silicon/ improving-control-flow-integrity-with-pointer-authentication, 2026, accessed 2026-05-14

2026
[37]

RISC-V Control-flow Integrity Extensions,

RISC-V International, “RISC-V Control-flow Integrity Extensions,” https://docs.riscv.org/reference/isa/priv/priv-cfi.html, 2026, accessed 2026-05-14

2026
[38]

Practical context-sensitive CFI,

V . van der Veen, D. Andriesse, E. Göktas, B. Gras, L. Sambuc, A. Slowinska, H. Bos, and C. Giuffrida, “Practical context-sensitive CFI,” inProceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2015, pp. 927–940

2015
[39]

A tough call: Mitigating advanced code-reuse attacks at the binary level,

V . van der Veen, E. Göktas, M. Contag, A. Pawlowski, X. Chen, S. Rawat, H. Bos, T. Holz, E. Athanasopoulos, and C. Giuffrida, “A tough call: Mitigating advanced code-reuse attacks at the binary level,” inProceedings of the 37th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2016, pp. 934–953

2016
[40]

τcfi: Type-assisted control flow integrity for x86-64 binaries,

P. Muntean, M. Fischer, G. Tan, Z. Lin, J. Grossklags, and C. Eck- ert, “τcfi: Type-assisted control flow integrity for x86-64 binaries,” inProceedings of the 21st International Symposium on Research in Attacks, Intrusions and Defenses (RAID), ser. Lecture Notes in Computer Science. Springer, 2018, pp. 423–444

2018
[41]

Enforcing unique code target property for control-flow integrity,

H. Hu, C. Qian, C. Yagemann, S. P. H. Chung, W. R. Harris, T. Kim, and W. Lee, “Enforcing unique code target property for control-flow integrity,” inProceedings of the 25th ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2018, pp. 1470–1486

2018
[42]

Losing control: On the effectiveness of control-flow integrity under stack attacks,

M. Conti, S. Crane, L. Davi, M. Franz, P. Larsen, M. Negro, C. Liebchen, M. Qunaibit, and A. Sadeghi, “Losing control: On the effectiveness of control-flow integrity under stack attacks,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2015, pp. 952–963

2015
[43]

CONFIRM: evaluating compatibility and relevance of control-flow integrity protections for modern software,

X. Xu, M. Ghaffarinia, W. Wang, K. W. Hamlen, and Z. Lin, “CONFIRM: evaluating compatibility and relevance of control-flow integrity protections for modern software,” inProceedings of the 28th USENIX Security Symposium (USENIX Security). USENIX Association, 2019, pp. 1805–1821

2019
[44]

Cfinsight: A comprehensive metric for CFI policies,

T. Frassetto, P. Jauernig, D. Koisser, and A. Sadeghi, “Cfinsight: A comprehensive metric for CFI policies,” inProceedings of the 29th Annual Network and Distributed System Security Symposium (NDSS). The Internet Society, 2022

2022
[45]

CUDA leaks: A detailed hack for CUDA and a (partial) fix,

R. D. Pietro, F. Lombardi, and A. Villani, “CUDA leaks: A detailed hack for CUDA and a (partial) fix,”ACM Trans. Embed. Comput. Syst., vol. 15, no. 1, pp. 15:1–15:25, 2016

2016
[46]

cucatch: A debugging tool for efficiently catching memory safety violations in CUDA applications,

M. T. I. Ziad, S. Damani, A. Jaleel, S. W. Keckler, and M. Stephenson, “cucatch: A debugging tool for efficiently catching memory safety violations in CUDA applications,”Proc. ACM Program. Lang., vol. 7, no. PLDI, pp. 124–147, 2023

2023
[47]

CuSafe: Capturing Memory Corruption on NVIDIA GPUs,

H. Lu, F. Zhang, Z. Zhang, S. Wang, and Y . Guo, “CuSafe: Capturing Memory Corruption on NVIDIA GPUs,” inProceedings of the 35th USENIX Security Symposium (USENIX Security). USENIX Association, 2026. [Online]. Available: https://www.usenix. org/conference/usenixsecurity26/cycle1-accepted-papers

2026
[48]

Compute Sanitizer,

NVIDIA, “Compute Sanitizer,” https://docs.nvidia.com/cuda/ compute-sanitizer/, 2026, accessed 2026-05-14

2026
[49]

NVIDIA Multi-Instance GPU User Guide,

——, “NVIDIA Multi-Instance GPU User Guide,” https://docs.nvidia. com/datacenter/tesla/mig-user-guide/, 2026, accessed 2026-05-14

2026
[50]

Graviton: Trusted execution en- vironments on gpus,

S. V olos, K. Vaswani, and R. Bruno, “Graviton: Trusted execution en- vironments on gpus,” inProceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, 2018, pp. 681–696

2018
[51]

Hetero- geneous isolated execution for commodity gpus,

I. Jang, A. Tang, T. Kim, S. Sethumadhavan, and J. Huh, “Hetero- geneous isolated execution for commodity gpus,” inProceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2019, pp. 455–468

2019
[52]

Telekine: Secure computing with cloud gpus,

T. Hunt, Z. Jia, V . Miller, A. Szekely, Y . Hu, C. J. Rossbach, and E. Witchel, “Telekine: Secure computing with cloud gpus,” in Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 2020, pp. 817–833

2020
[53]

Honeycomb: Secure and efficient GPU executions via static validation,

H. Mai, J. Zhao, H. Zheng, Y . Zhao, Z. Liu, M. Gao, C. Wang, H. Cui, X. Feng, and C. Kozyrakis, “Honeycomb: Secure and efficient GPU executions via static validation,” inProceedings of the 17th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI). USENIX Association, 2023, pp. 155–172

2023
[54]

SAGE: software-based attestation for GPU execution,

A. Ivanov, B. Rothenberger, A. Dethise, M. Canini, T. Hoefler, and A. Perrig, “SAGE: software-based attestation for GPU execution,” inProceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC). USENIX Association, 2023, pp. 485–499

2023
[55]

NVIDIA Confidential Computing,

NVIDIA, “NVIDIA Confidential Computing,” https://www.nvidia. com/en-us/data-center/solutions/confidential-computing/, 2026, ac- cessed 2026-05-14

2026
[56]

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems,

G. F. Diamos, A. Kerr, S. Yalamanchili, and N. Clark, “Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems,” inProceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, 2010, pp. 353–364

2010
[57]

Flexible software profiling of GPU architectures,

M. Stephenson, S. K. S. Hari, Y . Lee, E. Ebrahimi, D. R. Johnson, D. W. Nellans, M. O’Connor, and S. W. Keckler, “Flexible software profiling of GPU architectures,” inProceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). ACM, 2015, pp. 185–197

2015
[58]

Nvbitfi: Dynamic fault injection for gpus,

T. Tsai, S. K. S. Hari, M. B. Sullivan, O. Villa, and S. W. Keckler, “Nvbitfi: Dynamic fault injection for gpus,” inProceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 2021, pp. 284–291

2021
[59]

NVLift: Lifting NVIDIA GPU Assembly to LLVM IR for Downstream Security Applications,

J. Wan, L. Z.-H. Tan, and D. J. Tian, “NVLift: Lifting NVIDIA GPU Assembly to LLVM IR for Downstream Security Applications,” in Proceedings of the Workshop on Binary Analysis Research (BAR). Internet Society, 2026

2026
[60]

NVIDIA, “CUDA-Q,” https://github.com/NVIDIA/cuda-quantum, 2026, gitHub repository; accessed 2026-05-23

2026
[61]

GooFit Contributors, “GooFit,” https://github.com/GooFit/GooFit, 2026, gitHub repository; accessed 2026-05-23

2026
[62]

Kokkos Contributors, “Kokkos,” https://github.com/kokkos/kokkos, 2026, gitHub repository; accessed 2026-05-23

2026
[63]

CUDA Samples,

NVIDIA, “CUDA Samples,” https://github.com/NVIDIA/ cuda-samples, 2026, gitHub repository; accessed 2026-05-23

2026
[64]

rawspec,

UCBerkeleySETI, “rawspec,” https://github.com/UCBerkeleySETI/ rawspec, 2026, gitHub repository; accessed 2026-05-23

2026
[65]

ptypy Contributors, “ptypy,” https://github.com/ptycho/ptypy, 2026, gitHub repository; accessed 2026-05-23

2026
[66]

empi Contributors, “empi,” https://github.com/develancer/empi, 2026, gitHub repository; accessed 2026-05-23

2026
[67]

Dr.Jit Contributors, “Dr.Jit,” https://github.com/mitsuba-renderer/drjit, 2026, gitHub repository; accessed 2026-05-23

2026
[68]

GooFit Issue #242: kMatrix/Amp3Body,

GooFit Contributors, “GooFit Issue #242: kMatrix/Amp3Body,” https://github.com/GooFit/GooFit/issues/242, 2026, gitHub issue; ac- cessed 2026-05-23. Appendix A. Open Science The anonymized WARPGUARDtool and reproduction artifact are available at https://anonymous.4open.science/r/ warpguard-anon/README.md. The package includes the tool source, CUDA/SASS fix...

2026

[1] [1]

Buffer overflow vulnerabilities in CUDA: a preliminary analysis,

A. Miele, “Buffer overflow vulnerabilities in CUDA: a preliminary analysis,”J. Comput. Virol. Hacking Tech., vol. 12, no. 2, pp. 113– 120, 2016

2016

[2] [2]

A study of overflow vulnerabilities on gpus,

B. Di, J. Sun, and H. Chen, “A study of overflow vulnerabilities on gpus,” inProceedings of the 13th IFIP WG 10.3 International Conference on Network and Parallel Computing (NPC), ser. Lecture Notes in Computer Science, 2016, pp. 103–115

2016

[3] [3]

GPU memory exploitation for fun and profit,

Y . Guo, Z. Zhang, and J. Yang, “GPU memory exploitation for fun and profit,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security). USENIX Association, 2024, pp. 4033–4050

2024

[4] [4]

Cuda, woulda, shoulda: Returning exploits in a sass-y world,

J. Roels, A. Jacobs, and S. V olckaert, “Cuda, woulda, shoulda: Returning exploits in a sass-y world,” inProceedings of the 18th European Workshop on Systems Security (EuroSec). ACM, 2025, pp. 40–48

2025

[5] [5]

Control-flow integrity,

M. Abadi, M. Budiu, Ú. Erlingsson, and J. Ligatti, “Control-flow integrity,” inProceedings of the 12th ACM Conference on Computer and Communications Security (CCS). ACM, 2005, pp. 340–353

2005

[6] [6]

Control-flow integrity: Precision, security, and perfor- mance,

N. Burow, S. A. Carr, J. Nash, P. Larsen, M. Franz, S. Brunthaler, and M. Payer, “Control-flow integrity: Precision, security, and perfor- mance,”ACM Comput. Surv., vol. 50, no. 1, pp. 16:1–16:33, 2017

2017

[7] [7]

Stitching the gad- gets: On the ineffectiveness of coarse-grained control-flow integrity protection,

L. Davi, A. Sadeghi, D. Lehmann, and F. Monrose, “Stitching the gad- gets: On the ineffectiveness of coarse-grained control-flow integrity protection,” inProceedings of the 23rd USENIX Security Symposium (USENIX Security). USENIX Association, 2014, pp. 401–416

2014

[8] [8]

Control-flow bending: On the effectiveness of control-flow integrity,

N. Carlini, A. Barresi, M. Payer, D. A. Wagner, and T. R. Gross, “Control-flow bending: On the effectiveness of control-flow integrity,” inProceedings of the 24th USENIX Security Symposium (USENIX Security). USENIX Association, 2015, pp. 161–176

2015

[9] [9]

Control jujutsu: On the weaknesses of fine-grained control flow integrity,

I. Evans, F. Long, U. Otgonbaatar, H. E. Shrobe, M. C. Rinard, H. Okhravi, and S. Sidiroglou-Douskos, “Control jujutsu: On the weaknesses of fine-grained control flow integrity,” inProceedings of the 22nd ACM SIGSAC Conference on Computer and Communica- tions Security (CCS). ACM, 2015, pp. 901–913

2015

[10] [10]

CUDA Compiler Driver NVCC,

NVIDIA, “CUDA Compiler Driver NVCC,” https://docs.nvidia.com/ cuda/cuda-compiler-driver-nvcc/, 2026, accessed 2026-05-14

2026

[11] [11]

Parallel Thread Execution ISA,

——, “Parallel Thread Execution ISA,” https://docs.nvidia.com/cuda/ parallel-thread-execution/, 2026, accessed 2026-05-12

2026

[12] [12]

CUDA Binary Utilities,

——, “CUDA Binary Utilities,” https://docs.nvidia.com/cuda/ cuda-binary-utilities/, 2026, accessed 2026-05-12

2026

[13] [13]

CUDA C++ Programming Guide,

——, “CUDA C++ Programming Guide,” https://docs.nvidia.com/ cuda/cuda-c-programming-guide/, 2026, accessed 2026-05-12

2026

[14] [14]

Control flow management in modern gpus,

M. A. Shoushtary, J. T. Murgadas, and A. González, “Control flow management in modern gpus,”CoRR, vol. abs/2407.02944, 2024

work page arXiv 2024

[15] [15]

PPL-CUDA-SMC,

PPL-CUDA-SMC Contributors, “PPL-CUDA-SMC,” https://github. com/JoeyOhman/PPL-CUDA-SMC, 2026, gitHub repository; ac- cessed 2026-05-23

2026

[16] [16]

Pacific Northwest National Laboratory, “SV-Sim,” https://github.com/ pnnl/SV-Sim, 2026, gitHub repository; accessed 2026-05-23

2026

[17] [17]

——, “DM-Sim,” https://github.com/pnnl/DM-Sim, 2026, gitHub repository; accessed 2026-05-23

2026

[18] [18]

Demystifying and Exploiting ASLR on NVIDIA GPUs,

R. Zhu, G. Chen, W. Shen, L. Zhang, D. Shen, R. Chang, and Y . Guo, “Demystifying and Exploiting ASLR on NVIDIA GPUs,” inProceedings of the 47th IEEE Symposium on Security and Privacy (S&P). IEEE, 2026

2026

[19] [19]

GHost in the SHELL: A GPU-to-Host Memory Attack and Its Mitigation,

S. Roh, W. Choi, J. Chung, Y . Lee, S. Song, and B. Lee, “GHost in the SHELL: A GPU-to-Host Memory Attack and Its Mitigation,” in Proceedings of the 47th IEEE Symposium on Security and Privacy (S&P). IEEE, 2026

2026

[20] [20]

Practical control flow integrity and random- ization for binary executables,

C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant, D. Song, and W. Zou, “Practical control flow integrity and random- ization for binary executables,” inProceedings of the 34th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2013, pp. 559–573

2013

[21] [21]

Control flow integrity for COTS binaries,

M. Zhang and R. Sekar, “Control flow integrity for COTS binaries,” inProceedings of the 22nd USENIX Security Symposium (USENIX Security). USENIX Association, 2013, pp. 337–352

2013

[22] [22]

Securing GPU via region-based bounds checking,

J. Lee, Y . Kim, J. Cao, E. Kim, J. Lee, and H. Kim, “Securing GPU via region-based bounds checking,” inProceedings of the 49th Annual International Symposium on Computer Architecture (ISCA). ACM, 2022, pp. 27–41

2022

[23] [23]

Guardian: Safe GPU sharing in multi-tenant environments,

M. Pavlidakis, G. Vasiliadis, S. Mavridis, A. Argyros, A. Chaz- apis, and A. Bilas, “Guardian: Safe GPU sharing in multi-tenant environments,” inProceedings of the 25th International Middleware Conference (Middleware). ACM, 2024, pp. 313–326

2024

[24] [24]

Gpuarmor: A hardware-software co-design for efficient and scalable memory safety on gpus,

M. T. I. Ziad, S. Damani, M. Stephenson, S. W. Keckler, and A. Jaleel, “Gpuarmor: A hardware-software co-design for efficient and scalable memory safety on gpus,”CoRR, vol. abs/2502.17780, 2025

work page arXiv 2025

[25] [25]

Nvbit: A dynamic binary instrumentation framework for NVIDIA gpus,

O. Villa, M. Stephenson, D. W. Nellans, and S. W. Keckler, “Nvbit: A dynamic binary instrumentation framework for NVIDIA gpus,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, 2019, pp. 372–383

2019

[26] [26]

The geometry of innocent flesh on the bone: return- into-libc without function calls (on the x86),

H. Shacham, “The geometry of innocent flesh on the bone: return- into-libc without function calls (on the x86),” inProceedings of the 14th ACM Conference on Computer and Communications Security (CCS). ACM, 2007, pp. 552–561

2007

[27] [27]

Return- oriented programming: Systems, languages, and applications,

R. Roemer, E. Buchanan, H. Shacham, and S. Savage, “Return- oriented programming: Systems, languages, and applications,”ACM Trans. Inf. Syst. Secur ., vol. 15, no. 1, pp. 2:1–2:34, 2012

2012

[28] [28]

Return-oriented programming without returns,

S. Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, H. Shacham, and M. Winandy, “Return-oriented programming without returns,” in Proceedings of the 17th ACM Conference on Computer and Commu- nications Security (CCS). ACM, 2010, pp. 559–572

2010

[29] [29]

Jump-oriented programming: a new class of code-reuse attack,

T. K. Bletsch, X. Jiang, V . W. Freeh, and Z. Liang, “Jump-oriented programming: a new class of code-reuse attack,” inProceedings of the 6th ACM Symposium on Information, Computer and Communications Security (ASIACCS). ACM, 2011, pp. 30–40

2011

[30] [30]

Counterfeit object-oriented programming: On the difficulty of preventing code reuse attacks in C++ applications,

F. Schuster, T. Tendyck, C. Liebchen, L. Davi, A. Sadeghi, and T. Holz, “Counterfeit object-oriented programming: On the difficulty of preventing code reuse attacks in C++ applications,” inProceedings of the 36th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2015, pp. 745–762

2015

[31] [31]

Enforcing forward-edge control-flow integrity in GCC & LLVM,

C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, Ú. Erlings- son, L. Lozano, and G. Pike, “Enforcing forward-edge control-flow integrity in GCC & LLVM,” inProceedings of the 23rd USENIX Security Symposium (USENIX Security). USENIX Association, 2014, pp. 941–955

2014

[32] [32]

Control Flow Integrity,

LLVM Project, “Control Flow Integrity,” https://clang.llvm.org/docs/ ControlFlowIntegrity.html, 2026, accessed 2026-05-12

2026

[33] [33]

ShadowCallStack,

——, “ShadowCallStack,” https://clang.llvm.org/docs/ ShadowCallStack.html, 2026, accessed 2026-05-14

2026

[34] [34]

/guard: Enable Control Flow Guard,

Microsoft, “/guard: Enable Control Flow Guard,” https://learn. microsoft.com/cpp/build/reference/guard-enable-control-flow-guard, 2025, accessed 2026-05-14

2025

[35] [35]

A Technical Look at Intel Control-Flow Enforcement Technol- ogy,

Intel, “A Technical Look at Intel Control-Flow Enforcement Technol- ogy,” https://www.intel.com/content/www/us/en/developer/articles/ technical/technical-look-control-flow-enforcement-technology.html, 2020, accessed 2026-05-14

2020

[36] [36]

Improving Control Flow Integrity with Pointer Authen- tication,

Apple, “Improving Control Flow Integrity with Pointer Authen- tication,” https://developer.apple.com/documentation/apple-silicon/ improving-control-flow-integrity-with-pointer-authentication, 2026, accessed 2026-05-14

2026

[37] [37]

RISC-V Control-flow Integrity Extensions,

RISC-V International, “RISC-V Control-flow Integrity Extensions,” https://docs.riscv.org/reference/isa/priv/priv-cfi.html, 2026, accessed 2026-05-14

2026

[38] [38]

Practical context-sensitive CFI,

V . van der Veen, D. Andriesse, E. Göktas, B. Gras, L. Sambuc, A. Slowinska, H. Bos, and C. Giuffrida, “Practical context-sensitive CFI,” inProceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2015, pp. 927–940

2015

[39] [39]

A tough call: Mitigating advanced code-reuse attacks at the binary level,

V . van der Veen, E. Göktas, M. Contag, A. Pawlowski, X. Chen, S. Rawat, H. Bos, T. Holz, E. Athanasopoulos, and C. Giuffrida, “A tough call: Mitigating advanced code-reuse attacks at the binary level,” inProceedings of the 37th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2016, pp. 934–953

2016

[40] [40]

τcfi: Type-assisted control flow integrity for x86-64 binaries,

P. Muntean, M. Fischer, G. Tan, Z. Lin, J. Grossklags, and C. Eck- ert, “τcfi: Type-assisted control flow integrity for x86-64 binaries,” inProceedings of the 21st International Symposium on Research in Attacks, Intrusions and Defenses (RAID), ser. Lecture Notes in Computer Science. Springer, 2018, pp. 423–444

2018

[41] [41]

Enforcing unique code target property for control-flow integrity,

H. Hu, C. Qian, C. Yagemann, S. P. H. Chung, W. R. Harris, T. Kim, and W. Lee, “Enforcing unique code target property for control-flow integrity,” inProceedings of the 25th ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2018, pp. 1470–1486

2018

[42] [42]

Losing control: On the effectiveness of control-flow integrity under stack attacks,

M. Conti, S. Crane, L. Davi, M. Franz, P. Larsen, M. Negro, C. Liebchen, M. Qunaibit, and A. Sadeghi, “Losing control: On the effectiveness of control-flow integrity under stack attacks,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2015, pp. 952–963

2015

[43] [43]

CONFIRM: evaluating compatibility and relevance of control-flow integrity protections for modern software,

X. Xu, M. Ghaffarinia, W. Wang, K. W. Hamlen, and Z. Lin, “CONFIRM: evaluating compatibility and relevance of control-flow integrity protections for modern software,” inProceedings of the 28th USENIX Security Symposium (USENIX Security). USENIX Association, 2019, pp. 1805–1821

2019

[44] [44]

Cfinsight: A comprehensive metric for CFI policies,

T. Frassetto, P. Jauernig, D. Koisser, and A. Sadeghi, “Cfinsight: A comprehensive metric for CFI policies,” inProceedings of the 29th Annual Network and Distributed System Security Symposium (NDSS). The Internet Society, 2022

2022

[45] [45]

CUDA leaks: A detailed hack for CUDA and a (partial) fix,

R. D. Pietro, F. Lombardi, and A. Villani, “CUDA leaks: A detailed hack for CUDA and a (partial) fix,”ACM Trans. Embed. Comput. Syst., vol. 15, no. 1, pp. 15:1–15:25, 2016

2016

[46] [46]

cucatch: A debugging tool for efficiently catching memory safety violations in CUDA applications,

M. T. I. Ziad, S. Damani, A. Jaleel, S. W. Keckler, and M. Stephenson, “cucatch: A debugging tool for efficiently catching memory safety violations in CUDA applications,”Proc. ACM Program. Lang., vol. 7, no. PLDI, pp. 124–147, 2023

2023

[47] [47]

CuSafe: Capturing Memory Corruption on NVIDIA GPUs,

H. Lu, F. Zhang, Z. Zhang, S. Wang, and Y . Guo, “CuSafe: Capturing Memory Corruption on NVIDIA GPUs,” inProceedings of the 35th USENIX Security Symposium (USENIX Security). USENIX Association, 2026. [Online]. Available: https://www.usenix. org/conference/usenixsecurity26/cycle1-accepted-papers

2026

[48] [48]

Compute Sanitizer,

NVIDIA, “Compute Sanitizer,” https://docs.nvidia.com/cuda/ compute-sanitizer/, 2026, accessed 2026-05-14

2026

[49] [49]

NVIDIA Multi-Instance GPU User Guide,

——, “NVIDIA Multi-Instance GPU User Guide,” https://docs.nvidia. com/datacenter/tesla/mig-user-guide/, 2026, accessed 2026-05-14

2026

[50] [50]

Graviton: Trusted execution en- vironments on gpus,

S. V olos, K. Vaswani, and R. Bruno, “Graviton: Trusted execution en- vironments on gpus,” inProceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, 2018, pp. 681–696

2018

[51] [51]

Hetero- geneous isolated execution for commodity gpus,

I. Jang, A. Tang, T. Kim, S. Sethumadhavan, and J. Huh, “Hetero- geneous isolated execution for commodity gpus,” inProceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2019, pp. 455–468

2019

[52] [52]

Telekine: Secure computing with cloud gpus,

T. Hunt, Z. Jia, V . Miller, A. Szekely, Y . Hu, C. J. Rossbach, and E. Witchel, “Telekine: Secure computing with cloud gpus,” in Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 2020, pp. 817–833

2020

[53] [53]

Honeycomb: Secure and efficient GPU executions via static validation,

H. Mai, J. Zhao, H. Zheng, Y . Zhao, Z. Liu, M. Gao, C. Wang, H. Cui, X. Feng, and C. Kozyrakis, “Honeycomb: Secure and efficient GPU executions via static validation,” inProceedings of the 17th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI). USENIX Association, 2023, pp. 155–172

2023

[54] [54]

SAGE: software-based attestation for GPU execution,

A. Ivanov, B. Rothenberger, A. Dethise, M. Canini, T. Hoefler, and A. Perrig, “SAGE: software-based attestation for GPU execution,” inProceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC). USENIX Association, 2023, pp. 485–499

2023

[55] [55]

NVIDIA Confidential Computing,

NVIDIA, “NVIDIA Confidential Computing,” https://www.nvidia. com/en-us/data-center/solutions/confidential-computing/, 2026, ac- cessed 2026-05-14

2026

[56] [56]

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems,

G. F. Diamos, A. Kerr, S. Yalamanchili, and N. Clark, “Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems,” inProceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, 2010, pp. 353–364

2010

[57] [57]

Flexible software profiling of GPU architectures,

M. Stephenson, S. K. S. Hari, Y . Lee, E. Ebrahimi, D. R. Johnson, D. W. Nellans, M. O’Connor, and S. W. Keckler, “Flexible software profiling of GPU architectures,” inProceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). ACM, 2015, pp. 185–197

2015

[58] [58]

Nvbitfi: Dynamic fault injection for gpus,

T. Tsai, S. K. S. Hari, M. B. Sullivan, O. Villa, and S. W. Keckler, “Nvbitfi: Dynamic fault injection for gpus,” inProceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 2021, pp. 284–291

2021

[59] [59]

NVLift: Lifting NVIDIA GPU Assembly to LLVM IR for Downstream Security Applications,

J. Wan, L. Z.-H. Tan, and D. J. Tian, “NVLift: Lifting NVIDIA GPU Assembly to LLVM IR for Downstream Security Applications,” in Proceedings of the Workshop on Binary Analysis Research (BAR). Internet Society, 2026

2026

[60] [60]

NVIDIA, “CUDA-Q,” https://github.com/NVIDIA/cuda-quantum, 2026, gitHub repository; accessed 2026-05-23

2026

[61] [61]

GooFit Contributors, “GooFit,” https://github.com/GooFit/GooFit, 2026, gitHub repository; accessed 2026-05-23

2026

[62] [62]

Kokkos Contributors, “Kokkos,” https://github.com/kokkos/kokkos, 2026, gitHub repository; accessed 2026-05-23

2026

[63] [63]

CUDA Samples,

NVIDIA, “CUDA Samples,” https://github.com/NVIDIA/ cuda-samples, 2026, gitHub repository; accessed 2026-05-23

2026

[64] [64]

rawspec,

UCBerkeleySETI, “rawspec,” https://github.com/UCBerkeleySETI/ rawspec, 2026, gitHub repository; accessed 2026-05-23

2026

[65] [65]

ptypy Contributors, “ptypy,” https://github.com/ptycho/ptypy, 2026, gitHub repository; accessed 2026-05-23

2026

[66] [66]

empi Contributors, “empi,” https://github.com/develancer/empi, 2026, gitHub repository; accessed 2026-05-23

2026

[67] [67]

Dr.Jit Contributors, “Dr.Jit,” https://github.com/mitsuba-renderer/drjit, 2026, gitHub repository; accessed 2026-05-23

2026

[68] [68]

GooFit Issue #242: kMatrix/Amp3Body,

GooFit Contributors, “GooFit Issue #242: kMatrix/Amp3Body,” https://github.com/GooFit/GooFit/issues/242, 2026, gitHub issue; ac- cessed 2026-05-23. Appendix A. Open Science The anonymized WARPGUARDtool and reproduction artifact are available at https://anonymous.4open.science/r/ warpguard-anon/README.md. The package includes the tool source, CUDA/SASS fix...

2026