pith. sign in

arxiv: 2507.02770 · v2 · submitted 2025-07-03 · 💻 cs.CR

Blueprint, Bootstrap, and Bridge: A Security Look at NVIDIA GPU Confidential Computing

Pith reviewed 2026-05-19 06:04 UTC · model grok-4.3

classification 💻 cs.CR
keywords GPU Confidential ComputingNVIDIA GPUSecurity AnalysisConfidential ComputingData ProtectionSystem ArchitectureBootstrap ProcessAI Workload Security
0
0 comments X p. Extension

The pith

NVIDIA GPU confidential computing keeps data transfers protected across the CPU-GPU bridge under its threat model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reconstructs a view of NVIDIA's proprietary GPU Confidential Computing system to examine its security architecture and verify that data stays protected during transfers. It breaks down the specialized hardware engines that enforce isolation, the bootstrap sequence that activates those protections, and targeted checks on data movement between the trusted CPU domain and the GPU. A reader would care because GPU-CC lets existing AI applications run securely with no code changes, yet its closed design leaves open questions about whether those protections actually hold for all transfer paths.

Core claim

By mapping the system's blueprint, bootstrap process, and bridge mechanisms, the authors establish that under the GPU-CC threat model data transfers along different paths remain protected when crossing between the trusted CPU and GPU domains.

What carries the argument

The GPU-CC bridge that coordinates hardware engines and software components to maintain isolation and protection for data moving between CPU and GPU domains.

If this is right

  • Existing AI workloads can continue to use GPU-CC without code changes while retaining the claimed protections.
  • Data movement between CPU and GPU domains stays isolated even across multiple transfer paths.
  • The bootstrap sequence successfully coordinates hardware and software to activate those protections before workloads run.
  • Security researchers gain a usable model for further inspection of the closed GPU-CC implementation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar reconstruction techniques could be applied to other proprietary confidential-computing stacks to check cross-domain data protection.
  • If the protections hold, GPU-CC could serve as a template for hardware vendors seeking to add secure acceleration to AI pipelines without requiring application rewrites.
  • The findings open the door to automated tools that monitor the bridge for deviations from the expected protected behavior in production clusters.

Load-bearing premise

The reconstruction of a coherent system view from proprietary components is accurate enough to support the security experiments and conclusions.

What would settle it

An experiment that demonstrates a data leak or exposure along any transfer path when running under the documented GPU-CC threat model and configuration.

Figures

Figures reproduced from arXiv: 2507.02770 by Enriquillo Valdez, Hani Jamjoom, Julian James Stephen, Michael Le, Salman Ahmed, Shixuan Zhao, Zhiqiang Lin, Zhongshu Gu.

Figure 1
Figure 1. Figure 1: Secure Data Exchange between CVM and GPU [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Software/Hardware Elements in GPU-CC [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Classification of Data Read via GPU’s BAR0 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Information Leakage in GSP-RM RPC queue elements, one entry for the RX queue’s header, and 63 entries for RX queue elements. If an attacker gains access to the physical address table, they can easily locate all elements in the TX/RX queues. For example, the second entry ( 0x000000016921f000) points to the TX/RX queue’s header, where the readPtr and writePtr fields indicate the next elements to read and wri… view at source ↗
Figure 5
Figure 5. Figure 5: Timing Channels in CPU-GSP Memory Transfers [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The Interactions among SEC2, WLC, and LCIC [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: GPU’s Device and RIM File Certificate Chains [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

NVIDIA GPU Confidential Computing (GPU-CC) aims to provide secure execution for AI workloads. For end users, enabling GPU-CC is seamless and requires no modifications to existing applications. However, this ease of adoption relies on a proprietary and highly complex system that is difficult to inspect, creating challenges for researchers seeking to understand its architecture and security landscape. In this work, we provide a security look at GPU-CC by reconstructing a coherent view of the system. We first examine the system's blueprint, focusing on the specialized architectural engines that support its security mechanisms. We then analyze the bootstrap process, which coordinates hardware and software components to establish these protections. Finally, we conduct targeted experiments to assess whether, under the GPU-CC threat model, data transfers along different paths remain protected across the bridge between trusted CPU and GPU domains. We responsibly disclosed all security findings presented in this paper to the NVIDIA Product Security Incident Response Team (PSIRT).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper reconstructs a coherent architectural view of NVIDIA GPU Confidential Computing (GPU-CC) by first detailing the blueprint of specialized hardware engines supporting its security mechanisms, then analyzing the bootstrap process that coordinates hardware and software to establish trusted domains, and finally reporting targeted experiments assessing whether data transfers along different paths remain protected across the CPU-GPU bridge under the stated GPU-CC threat model. All security findings were responsibly disclosed to NVIDIA PSIRT.

Significance. If the reconstruction accurately captures the closed-source system and the experiments comprehensively cover relevant paths, the work would provide a valuable public analysis of protections for AI workloads on widely deployed NVIDIA hardware. The responsible disclosure and focus on a practical threat model are strengths; however, the absence of independent corroboration (such as a machine-checked model or vendor diagram) limits the strength of the security conclusions.

major comments (1)
  1. [Bridge / targeted experiments] The central claim that data transfers remain protected across the bridge depends on the completeness of the proprietary reconstruction described in the blueprint and bootstrap sections. Without additional validation (e.g., cross-checks against public documentation, a formal model, or explicit discussion of how omitted mechanisms were ruled out), it is difficult to confirm that all relevant data paths were examined in the targeted experiments.
minor comments (2)
  1. [Abstract] The abstract describes the methodology but does not summarize the concrete outcomes of the targeted experiments (e.g., which paths were tested and what the results showed).
  2. Notation for trust domains and data paths could be introduced earlier with a diagram to improve readability when discussing the bridge experiments.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the strengths of our responsible disclosure and practical threat model focus. We address the major comment below, revising the manuscript to improve transparency around our reconstruction methodology while remaining honest about inherent limitations of analyzing a closed-source system.

read point-by-point responses
  1. Referee: [Bridge / targeted experiments] The central claim that data transfers remain protected across the bridge depends on the completeness of the proprietary reconstruction described in the blueprint and bootstrap sections. Without additional validation (e.g., cross-checks against public documentation, a formal model, or explicit discussion of how omitted mechanisms were ruled out), it is difficult to confirm that all relevant data paths were examined in the targeted experiments.

    Authors: We agree that the strength of our claims on protected data transfers rests on the accuracy and completeness of the blueprint and bootstrap reconstruction. Our analysis draws from public NVIDIA documentation on GPU-CC, hardware interface probing, and empirical experiments across multiple transfer paths under the stated threat model. In the revised manuscript, we have added an explicit subsection detailing our cross-checks against available public specifications, the criteria used to identify and rule out alternative mechanisms, and consistency verification across GPU configurations. These additions aim to make the coverage of relevant paths more transparent. A machine-checked formal model or internal vendor diagrams cannot be provided, as the system is proprietary and such artifacts are not accessible through public channels or the disclosure process. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper reconstructs a coherent view of the proprietary GPU-CC system by examining its blueprint and bootstrap process, then performs targeted experiments to assess data transfer protections under the stated threat model. This chain relies on empirical observation and analysis rather than any self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or derivations are presented that reduce the central claim to its own inputs by construction; the assessment is externally falsifiable via the experiments and responsible disclosure process. The reconstruction serves as an input step whose accuracy is a standard limitation for closed-source systems, not a circularity mechanism.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis depends on the vendor-defined threat model and the feasibility of reconstructing internal behavior from observable interfaces without full source access.

axioms (1)
  • domain assumption The GPU-CC threat model accurately reflects the intended security boundaries.
    All experiments and conclusions are conditioned on this model as stated by NVIDIA.

pith-pipeline@v0.9.0 · 5710 in / 1040 out tokens · 44148 ms · 2026-05-19T06:04:53.427952+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GPUBreach: Privilege Escalation Attacks on GPUs using Rowhammer

    cs.CR 2026-05 unverdicted novelty 8.0

    Unprivileged CUDA kernels can use Rowhammer to tamper with GPU page tables for targeted privilege escalation, leaking cryptographic keys and escalating to CPU root access by bypassing IOMMU.

  2. Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

    cs.PF 2026-04 conditional novelty 7.0

    A technique recovers complete GPU hardware command streams from NVIDIA's closed-source CUDA driver via kernel instrumentation and doorbell watchpoints, demonstrated on data movement and CUDA Graphs.

  3. When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

    cs.CR 2026-05 unverdicted novelty 5.0

    A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.

  4. When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

    cs.CR 2026-05 unverdicted novelty 4.0

    A structured survey of confidential computing for agentic AI that catalogs TEE platforms, agent-specific threats, transferable defenses, and remaining gaps in end-to-end frameworks.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 3 Pith papers

  1. [1]

    AMD. 2020. Strengthening VM isolation with integrity protection and more. AMD (2020)

  2. [2]

    AMD. 2023. AMD SEV-TIO: Trusted I/O for Secure Encrypted Virtualization. AMD (2023)

  3. [3]

    Phil Rogers Antoine Delignat-Lavaud. 2023. Hopper Confidential Computing: How it Works under the Hood. https://www .nvidia.com/en-us/on-demand/ session/gtcspring23-s51709/

  4. [4]

    Henk Birkholz, Dave Thaler, Michael Richardson, Ned Smith, and Wei Pan

  5. [5]

    RFC 9334

    Remote ATtestation procedureS (RATS) Architecture. RFC 9334. https: //doi.org/10.17487/RFC9334

  6. [6]

    Pau-Chen Cheng, Kevin Eykholt, Zhongshu Gu, Hani Jamjoom, KR Jayaram, Enriquillo Valdez, and Ashish Verma. 2024. Deta: Minimizing data leaks in federated learning via decentralized and trustworthy aggregation. In Proceedings of the nineteenth european conference on computer systems . 219–235

  7. [7]

    Pau-Chen Cheng, Wojciech Ozga, Enriquillo Valdez, Salman Ahmed, Zhongshu Gu, Hani Jamjoom, Hubertus Franke, and James Bottomley. 2024. Intel tdx demystified: A top-down approach. Comput. Surveys 56, 9 (2024), 1–33

  8. [8]

    Yunjie Deng, Chenxu Wang, Shunchang Yu, Shiqing Liu, Zhenyu Ning, Kevin Leach, Jin Li, Shoumeng Yan, Zhengyu He, Jiannong Cao, et al. 2022. Strongbox: A gpu tee on arm endpoints. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security . 769–783

  9. [9]

    Gobikrishna Dhanuskodi, Sudeshna Guha, Vidhya Krishnan, Aruna Manjunatha, Rob Nertney, Michael O’Connor, and Phil Rogers. 2023. Creating the First Confidential GPUs. Commun. ACM 67, 1 (2023), 60–67

  10. [10]

    Hubert Eichner, Daniel Ramage, Kallista Bonawitz, Dzmitry Huba, Tiziano San- toro, Brett McLarnon, Timon Van Overveldt, Nova Fallen, Peter Kairouz, Al- bert Cheu, et al . 2024. Confidential federated computations. arXiv preprint arXiv:2404.10764 (2024)

  11. [11]

    Zhongshu Gu, Hani Jamjoom, Dong Su, Heqing Huang, Jialong Zhang, Tengfei Ma, Dimitrios Pendarakis, and Ian Molloy. 2019. Reaching data confidentiality and model accountability on the caltrain. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) . IEEE, 336–348

  12. [12]

    Jinnan Guo, Peter Pietzuch, Andrew Paverd, and Kapil Vaswani. 2024. Trustwor- thy AI using Confidential Federated Learning: Federated learning and confiden- tial computing are not competing technologies. Queue 22, 2 (2024), 87–107

  13. [13]

    Intel. 2023. Intel ® TDX Connect Architecture Specification. https:// cdrdv2.intel.com/v1/dl/getContent/773614. (2023)

  14. [14]

    Intel. 2023. Intel ® Trust Domain Extensions. https://cdrdv2 .intel.com/v1/dl/ getContent/690419. (2023)

  15. [15]

    Andrei Ivanov, Benjamin Rothenberger, Arnaud Dethise, Marco Canini, Torsten Hoefler, and Adrian Perrig. 2023. {SAGE}: Software-based Attestation for {GPU } Execution. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). 485–499

  16. [16]

    Insu Jang, Adrian Tang, Taehoon Kim, Simha Sethumadhavan, and Jaehyuk Huh

  17. [17]

    InProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

    Heterogeneous isolated execution for commodity gpus. InProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems . 455–468

  18. [18]

    Jianyu Jiang, Ji Qi, Tianxiang Shen, Xusheng Chen, Shixiong Zhao, Sen Wang, Li Chen, Gong Zhang, Xiapu Luo, and Heming Cui. 2022. CRONUS: Fault- isolated, secure and high-performance heterogeneous computing for trusted execution environment. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 124–143

  19. [19]

    David Kaplan. 2017. Protecting vm register state with sev-es. AMD (2017)

  20. [20]

    David Kaplan, Jeremy Powell, and Tom Woller. 2016. AMD memory encryption. AMD (2016)

  21. [21]

    Sunho Lee, Jungwoo Kim, Seonjin Na, Jongse Park, and Jaehyuk Huh. 2022. Tnpu: Supporting trusted execution with tree-less integrity protection for neural processing unit. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 229–243

  22. [22]

    Haohui Mai, Jiacheng Zhao, Hongren Zheng, Yiyang Zhao, Zibin Liu, Mingyu Gao, Cong Wang, Huimin Cui, Xiaobing Feng, and Christos Kozyrakis. 2023. Honeycomb: Secure and efficient {GPU } executions via static validation. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23) . 155–172

  23. [23]

    MICROCHIP. [n. d.]. CEC1736 Real-Time Platform Root of Trust Controller. https://www.microchip.com/en-us/product/cec1736

  24. [24]

    Fan Mo, Hamed Haddadi, Kleomenis Katevas, Eduard Marin, Diego Perino, and Nicolas Kourtellis. 2022. Ppfl: Enhancing privacy in federated learning with confidential computing. GetMobile: Mobile Computing and Communications 25, 4 (2022), 35–38

  25. [25]

    Apoorve Mohan, Mengmei Ye, Hubertus Franke, Mudhakar Srivatsa, Zhuoran Liu, and Nelson Mimura Gonzalez. 2024. Securing AI Inference in the Cloud: Is CPU-GPU Confidential Computing Ready?. In 2024 IEEE 17th International Conference on Cloud Computing (CLOUD) . IEEE, 164–175

  26. [26]

    Rob Nertney. 2025. Remote Attestation for NVIDIA Hopper and Blackwell GPUs, CPUs, and Beyond

  27. [27]

    NVIDIA. 2024. Are the On-Die Root of Trust and SEC2 security microcontroller physically the same thing? https://forums .developer.nvidia.com/t/are-the- on-die-root-of-trust-and-sec2-security-microcontroller-physically-the-same- thing/307330

  28. [28]

    NVIDIA. 2025. nvTrust: Ancillary Software for NVIDIA Trusted Computing Solutions. https://github .com/NVIDIA/nvtrust

  29. [29]

    NVIDIA. 2025. OCSP Service API Documentation. https: //docs.attestation.nvidia.com/OCSP/ocsp_api.html. (2025)

  30. [30]

    NVIDIA. 2025. open-gpu-kernel-modules. https://github .com/NVIDIA/open- gpu-kernel-modules/blob/main/kernel-open/nvidia-uvm/uvm_gpu_non_ replayable_faults.c#L38

  31. [31]

    NVIDIA. 2025. RIM Service API Documentation. https://docs .nvidia.com/ attestation/api-docs-rim/latest/index .html. (2025)

  32. [32]

    Mark Overby. 2023. Attesting NVIDIA GPUs in a confidential computing envi- ronment. https://www .youtube.com/watch?v=CpUYXUBdRUA

  33. [33]

    Do Le Quoc and Christof Fetzer. 2021. Secfl: Confidential federated learning using tees. arXiv preprint arXiv:2110.00981 (2021)

  34. [34]

    Philip Rogers, Mark Overby, Vyas Venkataraman, Naveen Cherukuri, James Leroy Deming, Gobikrishna Dhanuskodi, Dwayne Swoboda, Lucien Dunning, Aruna Manjunatha, Aaron Jiricek, et al . 2023. Confidential computing using multi- instancing of parallel processors. US Patent App. 18/123,222

  35. [35]

    Philip Rogers, Mark Overby, Vyas Venkataraman, Naveen Cherukuri, James Leroy Deming, Gobikrishna Dhanuskodi, Dwayne Swoboda, Lucien Dunning, Aruna Manjunatha, Aaron Jiricek, et al. 2023. Confidential computing using parallel processors with code and data protection. US Patent App. 18/185,654

  36. [36]

    Philip John Rogers, Mark Overby, Michael Asbury Woodmansee, Vyas Venkatara- man, Naveen Cherukuri, Gobikrishna Dhanuskodi, Dwayne Frank Swoboda, Lucien Burton Dunning, Mark Hairgrove, Sudeshna Guha, et al . 2025. Imple- menting trusted executing environments across multiple processor devices. US Patent 12,219,057

  37. [37]

    Yifan Tan and Zeyu Mi. 2024. Performance Analysis and Optimization of Nvidia H100 Confidential Computing for AI Workloads. In 2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA) . IEEE, 1426–1432

  38. [38]

    Yifan Tan, Cheng Tan, Zeyu Mi, and Haibo Chen. 2024. PipeLLM: Fast and Con- fidential Large Language Model Services with Speculative Pipelined Encryption. arXiv preprint arXiv:2411.03357 (2024)

  39. [39]

    Kapil Vaswani, Stavros Volos, Cédric Fournet, Antonio Nino Diaz, Ken Gordon, Balaji Vembu, Sam Webster, David Chisnall, Saurabh Kulkarni, Graham Cun- ningham, et al. 2023. Confidential computing within an {AI} accelerator. In 2023 USENIX Annual Technical Conference (USENIX ATC 23) . 501–518

  40. [40]

    Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. 2018. Graviton: Trusted execution environments on {GPUs }. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . 681–696

  41. [41]

    Chenxu Wang, Fengwei Zhang, Yunjie Deng, Kevin Leach, Jiannong Cao, Zhenyu Ning, Shoumeng Yan, and Zhengyu He. 2024. Cage: Complementing arm cca with gpu extensions. In Network and Distributed System Security (NDSS) Symposium , Vol. 2024

  42. [42]

    Qifan Wang and David Oswald. 2024. Confidential Computing on Heterogeneous CPU-GPU Systems: Survey and Future Directions.arXiv preprint arXiv:2408.11601 (2024). A PLATFORM CONFIGURATIONS Our host system is equipped with dual-socket AMD EPYC 9634 84-core processors, with SEV-SNP enabled, and an 8-GPU NVIDIA H100 SXM5 setup. Each GPU has 80 GB of GPU memory...