WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries
Pith reviewed 2026-06-27 09:06 UTC · model grok-4.3
The pith
WarpGuard enforces control-flow integrity on executed NVIDIA SASS binaries at protected sites using recovered instructions for policy derivation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WarpGuard is the first protected-site CFI system for CUDA device binaries operating on executed SASS. It enforces at protected sites: recovered SASS instructions or sequences that consume control-flow state, provide sufficient binary evidence to derive policy, are checked before release, and fail closed on violation. It authenticates backward-edge continuation state for instrumented returns, validates recoverable forward targets per site, and reports fixed-edge, unsupported, profile-excluded, fallback, and no-surface outcomes outside the protected denominator.
What carries the argument
Protected-site enforcement on SASS control-flow consumption sites that recover instructions to derive and check policies from binary evidence.
If this is right
- It classifies 51,621 SASS control-flow sites across 77 CUDA artifacts, including 1,343 returns and 154 supported forward target-set entries.
- It records 52.2 million dynamic checks in tests.
- In representative attacks, native execution allows attacker behavior, detect-only records violations, and enforcement fails closed before releasing invalid transfers.
- Public evidence shows the SASS patterns occur in real systems including runtime dispatch tables and generated callable tables.
Where Pith is reading between the lines
- Similar protected-site approaches might apply to other low-level GPU or accelerator binaries where source-level protections fall short.
- The separation of dynamic instrumentation from callback-free enforcement could reduce overhead in high-performance computing environments.
- Binary analysis tools for CUDA might incorporate these recovery techniques to audit control-flow surfaces without runtime support.
Load-bearing premise
That sufficient binary evidence exists at SASS consumption sites to derive a sound policy without requiring source, PTX, or runtime callbacks.
What would settle it
A deployed CUDA system where a control-flow corruption attack succeeds on an instrumented binary because the SASS site lacks enough evidence to derive or enforce the policy.
Figures
read the original abstract
Recent CUDA exploitation work shows that GPU memory bugs can escalate into device-side control-flow corruption, as kernels later consume corrupted return continuations, function pointers, dispatch-table entries, or branch targets. For deployed CUDA binaries, the relevant security boundary is executed NVIDIA SASS, after PTX lowering, inlining, ABI decisions, register allocation, spills, predication, and SIMT execution; source- or PTX-level policies do not capture this boundary. We present WarpGuard, to our knowledge the first protected-site CFI system for CUDA device binaries operating on executed SASS. WarpGuard enforces at protected sites: recovered SASS instructions or sequences that consume control-flow state, provide sufficient binary evidence to derive policy, are checked before release, and fail closed on violation. It authenticates backward-edge continuation state for instrumented returns, validates recoverable forward targets per site, and reports fixed-edge, unsupported, profile-excluded, fallback, and no-surface outcomes outside the protected denominator. On 77 CUDA artifacts, WarpGuard classifies 51,621 SASS control-flow sites, including 1,343 returns and 154 supported forward target-set entries, and records 52.2 million dynamic checks. In representative backward- and forward-edge corruption attacks, native execution reaches attacker-selected behavior, detect-only mode records the expected violation, and enforcement fails closed before releasing the invalid protected transfer. Public-code evidence shows that the same SASS consumption patterns occur in real CUDA systems, including runtime dispatch tables, cuFFT callbacks, generated callable tables, and uploaded device-function pointers. WarpGuard delivers auditable protected-site CFI for CUDA SASS and separates dynamic-instrumentation enforcement from callback-free SASS timing and patch-cache feasibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents WarpGuard as the first protected-site CFI system for executed NVIDIA SASS binaries. It recovers SASS instructions or sequences at control-flow consumption sites, derives policies from binary evidence for backward-edge continuations and forward targets, performs checks before release, and fails closed on violation. Evaluation on 77 CUDA artifacts classifies 51,621 SASS sites (1,343 returns, 154 supported forward entries) with 52.2 million dynamic checks; attacks show native execution reaches attacker behavior while enforcement detects and blocks invalid transfers. Public-code evidence is cited for patterns in runtime dispatch tables, cuFFT callbacks, and callable tables.
Significance. If the central claims hold, WarpGuard fills a documented gap by operating after PTX lowering, inlining, register allocation, predication, and SIMT execution, where source/PTX policies are insufficient. The scale of the evaluation (77 artifacts, 51k+ sites, 52M checks) and explicit handling of fixed-edge/unsupported/fallback outcomes provide concrete evidence of practicality. The separation of enforcement from callback-free SASS timing is a clear strength.
major comments (3)
- [Evaluation] Evaluation section: the reported classification of 51,621 sites and 52.2 million checks supplies no implementation details, error bars, or verification that recovered SASS sites actually match the claimed policy derivation from binary evidence alone.
- [Policy Derivation] Policy derivation and protected-site definition: the claim that SASS consumption sites always yield sufficient static evidence for sound, complete policy (both 1,343 returns and 154 forward targets) is load-bearing, yet the evaluation on 77 artifacts does not demonstrate coverage for runtime dispatch tables, generated callable tables, or cuFFT callbacks where PTX lowering and inlining may leave ambiguous evidence.
- [Attack Evaluation] Attack evaluation: while native execution reaches attacker-selected behavior and enforcement fails closed, the paper does not quantify how often fallback or profile-excluded paths are taken, which directly affects the soundness claim for the protected denominator.
minor comments (2)
- [Abstract] Abstract: the phrase 'provide sufficient binary evidence to derive policy' is used without a precise definition or decision procedure that could be checked against the 51,621 sites.
- [Terminology] Terminology: 'fail closed' and 'no-surface outcomes' are introduced but lack a short formal statement of the exact failure semantics.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the significance of operating CFI at the SASS boundary. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the reported classification of 51,621 sites and 52.2 million checks supplies no implementation details, error bars, or verification that recovered SASS sites actually match the claimed policy derivation from binary evidence alone.
Authors: We agree that the evaluation section would benefit from additional implementation details. In the revised manuscript we will describe the SASS recovery and policy-derivation pipeline (including the binary analysis pass and how evidence is extracted at consumption sites), provide representative SASS snippets with the corresponding derived policies, and clarify that the reported figures are deterministic static counts rather than sampled measurements, rendering error bars inapplicable. These additions will directly verify that recovered sites match the binary-evidence claim. revision: yes
-
Referee: [Policy Derivation] Policy derivation and protected-site definition: the claim that SASS consumption sites always yield sufficient static evidence for sound, complete policy (both 1,343 returns and 154 forward targets) is load-bearing, yet the evaluation on 77 artifacts does not demonstrate coverage for runtime dispatch tables, generated callable tables, or cuFFT callbacks where PTX lowering and inlining may leave ambiguous evidence.
Authors: The protected-site definition explicitly excludes sites lacking sufficient evidence (reporting them as unsupported or fallback). The 77 artifacts encompass production CUDA libraries that contain the cited patterns; public-code references already illustrate dispatch tables, cuFFT callbacks, and callable tables. To strengthen the presentation we will add a short subsection with concrete SASS excerpts from these categories, showing which sites were classified as protected versus unsupported. This constitutes a partial revision: the core soundness argument for the protected denominator remains unchanged, but explicit coverage examples will be supplied. revision: partial
-
Referee: [Attack Evaluation] Attack evaluation: while native execution reaches attacker-selected behavior and enforcement fails closed, the paper does not quantify how often fallback or profile-excluded paths are taken, which directly affects the soundness claim for the protected denominator.
Authors: The evaluation already records fixed-edge, unsupported, profile-excluded, fallback, and no-surface outcomes for every site. We will augment the attack-evaluation section with a breakdown (table or text) reporting the observed frequencies of each category across the 51,621 sites. This will make explicit the size of the protected denominator and the fraction of paths that fall outside it, directly addressing the soundness concern. revision: yes
Circularity Check
No circularity; novel construction without reductions to fitted inputs or self-citations
full rationale
The paper presents WarpGuard as an engineering construction for protected-site CFI on executed NVIDIA SASS binaries. No equations, fitted parameters, predictions, or first-principles derivations appear that could reduce to inputs by construction. The core mechanism (recovering SASS sequences at consumption sites to derive and enforce policy) is described as a new system rather than a renaming or self-referential fit. Evaluation on 77 artifacts and 51,621 sites is empirical reporting, not a statistical prediction forced by prior fits. No load-bearing self-citations, uniqueness theorems from prior author work, or smuggled ansatzes are invoked. The derivation chain is self-contained as a systems artifact.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Buffer overflow vulnerabilities in CUDA: a preliminary analysis,
A. Miele, “Buffer overflow vulnerabilities in CUDA: a preliminary analysis,”J. Comput. Virol. Hacking Tech., vol. 12, no. 2, pp. 113– 120, 2016
2016
-
[2]
A study of overflow vulnerabilities on gpus,
B. Di, J. Sun, and H. Chen, “A study of overflow vulnerabilities on gpus,” inProceedings of the 13th IFIP WG 10.3 International Conference on Network and Parallel Computing (NPC), ser. Lecture Notes in Computer Science, 2016, pp. 103–115
2016
-
[3]
GPU memory exploitation for fun and profit,
Y . Guo, Z. Zhang, and J. Yang, “GPU memory exploitation for fun and profit,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security). USENIX Association, 2024, pp. 4033–4050
2024
-
[4]
Cuda, woulda, shoulda: Returning exploits in a sass-y world,
J. Roels, A. Jacobs, and S. V olckaert, “Cuda, woulda, shoulda: Returning exploits in a sass-y world,” inProceedings of the 18th European Workshop on Systems Security (EuroSec). ACM, 2025, pp. 40–48
2025
-
[5]
Control-flow integrity,
M. Abadi, M. Budiu, Ú. Erlingsson, and J. Ligatti, “Control-flow integrity,” inProceedings of the 12th ACM Conference on Computer and Communications Security (CCS). ACM, 2005, pp. 340–353
2005
-
[6]
Control-flow integrity: Precision, security, and perfor- mance,
N. Burow, S. A. Carr, J. Nash, P. Larsen, M. Franz, S. Brunthaler, and M. Payer, “Control-flow integrity: Precision, security, and perfor- mance,”ACM Comput. Surv., vol. 50, no. 1, pp. 16:1–16:33, 2017
2017
-
[7]
Stitching the gad- gets: On the ineffectiveness of coarse-grained control-flow integrity protection,
L. Davi, A. Sadeghi, D. Lehmann, and F. Monrose, “Stitching the gad- gets: On the ineffectiveness of coarse-grained control-flow integrity protection,” inProceedings of the 23rd USENIX Security Symposium (USENIX Security). USENIX Association, 2014, pp. 401–416
2014
-
[8]
Control-flow bending: On the effectiveness of control-flow integrity,
N. Carlini, A. Barresi, M. Payer, D. A. Wagner, and T. R. Gross, “Control-flow bending: On the effectiveness of control-flow integrity,” inProceedings of the 24th USENIX Security Symposium (USENIX Security). USENIX Association, 2015, pp. 161–176
2015
-
[9]
Control jujutsu: On the weaknesses of fine-grained control flow integrity,
I. Evans, F. Long, U. Otgonbaatar, H. E. Shrobe, M. C. Rinard, H. Okhravi, and S. Sidiroglou-Douskos, “Control jujutsu: On the weaknesses of fine-grained control flow integrity,” inProceedings of the 22nd ACM SIGSAC Conference on Computer and Communica- tions Security (CCS). ACM, 2015, pp. 901–913
2015
-
[10]
CUDA Compiler Driver NVCC,
NVIDIA, “CUDA Compiler Driver NVCC,” https://docs.nvidia.com/ cuda/cuda-compiler-driver-nvcc/, 2026, accessed 2026-05-14
2026
-
[11]
Parallel Thread Execution ISA,
——, “Parallel Thread Execution ISA,” https://docs.nvidia.com/cuda/ parallel-thread-execution/, 2026, accessed 2026-05-12
2026
-
[12]
CUDA Binary Utilities,
——, “CUDA Binary Utilities,” https://docs.nvidia.com/cuda/ cuda-binary-utilities/, 2026, accessed 2026-05-12
2026
-
[13]
CUDA C++ Programming Guide,
——, “CUDA C++ Programming Guide,” https://docs.nvidia.com/ cuda/cuda-c-programming-guide/, 2026, accessed 2026-05-12
2026
-
[14]
Control flow management in modern gpus,
M. A. Shoushtary, J. T. Murgadas, and A. González, “Control flow management in modern gpus,”CoRR, vol. abs/2407.02944, 2024
-
[15]
PPL-CUDA-SMC,
PPL-CUDA-SMC Contributors, “PPL-CUDA-SMC,” https://github. com/JoeyOhman/PPL-CUDA-SMC, 2026, gitHub repository; ac- cessed 2026-05-23
2026
-
[16]
Pacific Northwest National Laboratory, “SV-Sim,” https://github.com/ pnnl/SV-Sim, 2026, gitHub repository; accessed 2026-05-23
2026
-
[17]
——, “DM-Sim,” https://github.com/pnnl/DM-Sim, 2026, gitHub repository; accessed 2026-05-23
2026
-
[18]
Demystifying and Exploiting ASLR on NVIDIA GPUs,
R. Zhu, G. Chen, W. Shen, L. Zhang, D. Shen, R. Chang, and Y . Guo, “Demystifying and Exploiting ASLR on NVIDIA GPUs,” inProceedings of the 47th IEEE Symposium on Security and Privacy (S&P). IEEE, 2026
2026
-
[19]
GHost in the SHELL: A GPU-to-Host Memory Attack and Its Mitigation,
S. Roh, W. Choi, J. Chung, Y . Lee, S. Song, and B. Lee, “GHost in the SHELL: A GPU-to-Host Memory Attack and Its Mitigation,” in Proceedings of the 47th IEEE Symposium on Security and Privacy (S&P). IEEE, 2026
2026
-
[20]
Practical control flow integrity and random- ization for binary executables,
C. Zhang, T. Wei, Z. Chen, L. Duan, L. Szekeres, S. McCamant, D. Song, and W. Zou, “Practical control flow integrity and random- ization for binary executables,” inProceedings of the 34th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2013, pp. 559–573
2013
-
[21]
Control flow integrity for COTS binaries,
M. Zhang and R. Sekar, “Control flow integrity for COTS binaries,” inProceedings of the 22nd USENIX Security Symposium (USENIX Security). USENIX Association, 2013, pp. 337–352
2013
-
[22]
Securing GPU via region-based bounds checking,
J. Lee, Y . Kim, J. Cao, E. Kim, J. Lee, and H. Kim, “Securing GPU via region-based bounds checking,” inProceedings of the 49th Annual International Symposium on Computer Architecture (ISCA). ACM, 2022, pp. 27–41
2022
-
[23]
Guardian: Safe GPU sharing in multi-tenant environments,
M. Pavlidakis, G. Vasiliadis, S. Mavridis, A. Argyros, A. Chaz- apis, and A. Bilas, “Guardian: Safe GPU sharing in multi-tenant environments,” inProceedings of the 25th International Middleware Conference (Middleware). ACM, 2024, pp. 313–326
2024
-
[24]
Gpuarmor: A hardware-software co-design for efficient and scalable memory safety on gpus,
M. T. I. Ziad, S. Damani, M. Stephenson, S. W. Keckler, and A. Jaleel, “Gpuarmor: A hardware-software co-design for efficient and scalable memory safety on gpus,”CoRR, vol. abs/2502.17780, 2025
-
[25]
Nvbit: A dynamic binary instrumentation framework for NVIDIA gpus,
O. Villa, M. Stephenson, D. W. Nellans, and S. W. Keckler, “Nvbit: A dynamic binary instrumentation framework for NVIDIA gpus,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, 2019, pp. 372–383
2019
-
[26]
The geometry of innocent flesh on the bone: return- into-libc without function calls (on the x86),
H. Shacham, “The geometry of innocent flesh on the bone: return- into-libc without function calls (on the x86),” inProceedings of the 14th ACM Conference on Computer and Communications Security (CCS). ACM, 2007, pp. 552–561
2007
-
[27]
Return- oriented programming: Systems, languages, and applications,
R. Roemer, E. Buchanan, H. Shacham, and S. Savage, “Return- oriented programming: Systems, languages, and applications,”ACM Trans. Inf. Syst. Secur ., vol. 15, no. 1, pp. 2:1–2:34, 2012
2012
-
[28]
Return-oriented programming without returns,
S. Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, H. Shacham, and M. Winandy, “Return-oriented programming without returns,” in Proceedings of the 17th ACM Conference on Computer and Commu- nications Security (CCS). ACM, 2010, pp. 559–572
2010
-
[29]
Jump-oriented programming: a new class of code-reuse attack,
T. K. Bletsch, X. Jiang, V . W. Freeh, and Z. Liang, “Jump-oriented programming: a new class of code-reuse attack,” inProceedings of the 6th ACM Symposium on Information, Computer and Communications Security (ASIACCS). ACM, 2011, pp. 30–40
2011
-
[30]
Counterfeit object-oriented programming: On the difficulty of preventing code reuse attacks in C++ applications,
F. Schuster, T. Tendyck, C. Liebchen, L. Davi, A. Sadeghi, and T. Holz, “Counterfeit object-oriented programming: On the difficulty of preventing code reuse attacks in C++ applications,” inProceedings of the 36th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2015, pp. 745–762
2015
-
[31]
Enforcing forward-edge control-flow integrity in GCC & LLVM,
C. Tice, T. Roeder, P. Collingbourne, S. Checkoway, Ú. Erlings- son, L. Lozano, and G. Pike, “Enforcing forward-edge control-flow integrity in GCC & LLVM,” inProceedings of the 23rd USENIX Security Symposium (USENIX Security). USENIX Association, 2014, pp. 941–955
2014
-
[32]
Control Flow Integrity,
LLVM Project, “Control Flow Integrity,” https://clang.llvm.org/docs/ ControlFlowIntegrity.html, 2026, accessed 2026-05-12
2026
-
[33]
ShadowCallStack,
——, “ShadowCallStack,” https://clang.llvm.org/docs/ ShadowCallStack.html, 2026, accessed 2026-05-14
2026
-
[34]
/guard: Enable Control Flow Guard,
Microsoft, “/guard: Enable Control Flow Guard,” https://learn. microsoft.com/cpp/build/reference/guard-enable-control-flow-guard, 2025, accessed 2026-05-14
2025
-
[35]
A Technical Look at Intel Control-Flow Enforcement Technol- ogy,
Intel, “A Technical Look at Intel Control-Flow Enforcement Technol- ogy,” https://www.intel.com/content/www/us/en/developer/articles/ technical/technical-look-control-flow-enforcement-technology.html, 2020, accessed 2026-05-14
2020
-
[36]
Improving Control Flow Integrity with Pointer Authen- tication,
Apple, “Improving Control Flow Integrity with Pointer Authen- tication,” https://developer.apple.com/documentation/apple-silicon/ improving-control-flow-integrity-with-pointer-authentication, 2026, accessed 2026-05-14
2026
-
[37]
RISC-V Control-flow Integrity Extensions,
RISC-V International, “RISC-V Control-flow Integrity Extensions,” https://docs.riscv.org/reference/isa/priv/priv-cfi.html, 2026, accessed 2026-05-14
2026
-
[38]
Practical context-sensitive CFI,
V . van der Veen, D. Andriesse, E. Göktas, B. Gras, L. Sambuc, A. Slowinska, H. Bos, and C. Giuffrida, “Practical context-sensitive CFI,” inProceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2015, pp. 927–940
2015
-
[39]
A tough call: Mitigating advanced code-reuse attacks at the binary level,
V . van der Veen, E. Göktas, M. Contag, A. Pawlowski, X. Chen, S. Rawat, H. Bos, T. Holz, E. Athanasopoulos, and C. Giuffrida, “A tough call: Mitigating advanced code-reuse attacks at the binary level,” inProceedings of the 37th IEEE Symposium on Security and Privacy (S&P). IEEE Computer Society, 2016, pp. 934–953
2016
-
[40]
τcfi: Type-assisted control flow integrity for x86-64 binaries,
P. Muntean, M. Fischer, G. Tan, Z. Lin, J. Grossklags, and C. Eck- ert, “τcfi: Type-assisted control flow integrity for x86-64 binaries,” inProceedings of the 21st International Symposium on Research in Attacks, Intrusions and Defenses (RAID), ser. Lecture Notes in Computer Science. Springer, 2018, pp. 423–444
2018
-
[41]
Enforcing unique code target property for control-flow integrity,
H. Hu, C. Qian, C. Yagemann, S. P. H. Chung, W. R. Harris, T. Kim, and W. Lee, “Enforcing unique code target property for control-flow integrity,” inProceedings of the 25th ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2018, pp. 1470–1486
2018
-
[42]
Losing control: On the effectiveness of control-flow integrity under stack attacks,
M. Conti, S. Crane, L. Davi, M. Franz, P. Larsen, M. Negro, C. Liebchen, M. Qunaibit, and A. Sadeghi, “Losing control: On the effectiveness of control-flow integrity under stack attacks,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2015, pp. 952–963
2015
-
[43]
CONFIRM: evaluating compatibility and relevance of control-flow integrity protections for modern software,
X. Xu, M. Ghaffarinia, W. Wang, K. W. Hamlen, and Z. Lin, “CONFIRM: evaluating compatibility and relevance of control-flow integrity protections for modern software,” inProceedings of the 28th USENIX Security Symposium (USENIX Security). USENIX Association, 2019, pp. 1805–1821
2019
-
[44]
Cfinsight: A comprehensive metric for CFI policies,
T. Frassetto, P. Jauernig, D. Koisser, and A. Sadeghi, “Cfinsight: A comprehensive metric for CFI policies,” inProceedings of the 29th Annual Network and Distributed System Security Symposium (NDSS). The Internet Society, 2022
2022
-
[45]
CUDA leaks: A detailed hack for CUDA and a (partial) fix,
R. D. Pietro, F. Lombardi, and A. Villani, “CUDA leaks: A detailed hack for CUDA and a (partial) fix,”ACM Trans. Embed. Comput. Syst., vol. 15, no. 1, pp. 15:1–15:25, 2016
2016
-
[46]
cucatch: A debugging tool for efficiently catching memory safety violations in CUDA applications,
M. T. I. Ziad, S. Damani, A. Jaleel, S. W. Keckler, and M. Stephenson, “cucatch: A debugging tool for efficiently catching memory safety violations in CUDA applications,”Proc. ACM Program. Lang., vol. 7, no. PLDI, pp. 124–147, 2023
2023
-
[47]
CuSafe: Capturing Memory Corruption on NVIDIA GPUs,
H. Lu, F. Zhang, Z. Zhang, S. Wang, and Y . Guo, “CuSafe: Capturing Memory Corruption on NVIDIA GPUs,” inProceedings of the 35th USENIX Security Symposium (USENIX Security). USENIX Association, 2026. [Online]. Available: https://www.usenix. org/conference/usenixsecurity26/cycle1-accepted-papers
2026
-
[48]
Compute Sanitizer,
NVIDIA, “Compute Sanitizer,” https://docs.nvidia.com/cuda/ compute-sanitizer/, 2026, accessed 2026-05-14
2026
-
[49]
NVIDIA Multi-Instance GPU User Guide,
——, “NVIDIA Multi-Instance GPU User Guide,” https://docs.nvidia. com/datacenter/tesla/mig-user-guide/, 2026, accessed 2026-05-14
2026
-
[50]
Graviton: Trusted execution en- vironments on gpus,
S. V olos, K. Vaswani, and R. Bruno, “Graviton: Trusted execution en- vironments on gpus,” inProceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, 2018, pp. 681–696
2018
-
[51]
Hetero- geneous isolated execution for commodity gpus,
I. Jang, A. Tang, T. Kim, S. Sethumadhavan, and J. Huh, “Hetero- geneous isolated execution for commodity gpus,” inProceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2019, pp. 455–468
2019
-
[52]
Telekine: Secure computing with cloud gpus,
T. Hunt, Z. Jia, V . Miller, A. Szekely, Y . Hu, C. J. Rossbach, and E. Witchel, “Telekine: Secure computing with cloud gpus,” in Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 2020, pp. 817–833
2020
-
[53]
Honeycomb: Secure and efficient GPU executions via static validation,
H. Mai, J. Zhao, H. Zheng, Y . Zhao, Z. Liu, M. Gao, C. Wang, H. Cui, X. Feng, and C. Kozyrakis, “Honeycomb: Secure and efficient GPU executions via static validation,” inProceedings of the 17th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI). USENIX Association, 2023, pp. 155–172
2023
-
[54]
SAGE: software-based attestation for GPU execution,
A. Ivanov, B. Rothenberger, A. Dethise, M. Canini, T. Hoefler, and A. Perrig, “SAGE: software-based attestation for GPU execution,” inProceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC). USENIX Association, 2023, pp. 485–499
2023
-
[55]
NVIDIA Confidential Computing,
NVIDIA, “NVIDIA Confidential Computing,” https://www.nvidia. com/en-us/data-center/solutions/confidential-computing/, 2026, ac- cessed 2026-05-14
2026
-
[56]
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems,
G. F. Diamos, A. Kerr, S. Yalamanchili, and N. Clark, “Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems,” inProceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). ACM, 2010, pp. 353–364
2010
-
[57]
Flexible software profiling of GPU architectures,
M. Stephenson, S. K. S. Hari, Y . Lee, E. Ebrahimi, D. R. Johnson, D. W. Nellans, M. O’Connor, and S. W. Keckler, “Flexible software profiling of GPU architectures,” inProceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). ACM, 2015, pp. 185–197
2015
-
[58]
Nvbitfi: Dynamic fault injection for gpus,
T. Tsai, S. K. S. Hari, M. B. Sullivan, O. Villa, and S. W. Keckler, “Nvbitfi: Dynamic fault injection for gpus,” inProceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 2021, pp. 284–291
2021
-
[59]
NVLift: Lifting NVIDIA GPU Assembly to LLVM IR for Downstream Security Applications,
J. Wan, L. Z.-H. Tan, and D. J. Tian, “NVLift: Lifting NVIDIA GPU Assembly to LLVM IR for Downstream Security Applications,” in Proceedings of the Workshop on Binary Analysis Research (BAR). Internet Society, 2026
2026
-
[60]
NVIDIA, “CUDA-Q,” https://github.com/NVIDIA/cuda-quantum, 2026, gitHub repository; accessed 2026-05-23
2026
-
[61]
GooFit Contributors, “GooFit,” https://github.com/GooFit/GooFit, 2026, gitHub repository; accessed 2026-05-23
2026
-
[62]
Kokkos Contributors, “Kokkos,” https://github.com/kokkos/kokkos, 2026, gitHub repository; accessed 2026-05-23
2026
-
[63]
CUDA Samples,
NVIDIA, “CUDA Samples,” https://github.com/NVIDIA/ cuda-samples, 2026, gitHub repository; accessed 2026-05-23
2026
-
[64]
rawspec,
UCBerkeleySETI, “rawspec,” https://github.com/UCBerkeleySETI/ rawspec, 2026, gitHub repository; accessed 2026-05-23
2026
-
[65]
ptypy Contributors, “ptypy,” https://github.com/ptycho/ptypy, 2026, gitHub repository; accessed 2026-05-23
2026
-
[66]
empi Contributors, “empi,” https://github.com/develancer/empi, 2026, gitHub repository; accessed 2026-05-23
2026
-
[67]
Dr.Jit Contributors, “Dr.Jit,” https://github.com/mitsuba-renderer/drjit, 2026, gitHub repository; accessed 2026-05-23
2026
-
[68]
GooFit Issue #242: kMatrix/Amp3Body,
GooFit Contributors, “GooFit Issue #242: kMatrix/Amp3Body,” https://github.com/GooFit/GooFit/issues/242, 2026, gitHub issue; ac- cessed 2026-05-23. Appendix A. Open Science The anonymized WARPGUARDtool and reproduction artifact are available at https://anonymous.4open.science/r/ warpguard-anon/README.md. The package includes the tool source, CUDA/SASS fix...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.