Unprivileged Topology Certificates for Cloud GPU Attestation

Faruk Alpay; Taylan Alpay

arxiv: 2606.24934 · v1 · pith:MMJKNQDSnew · submitted 2026-06-22 · 💻 cs.CR · cs.AR

Unprivileged Topology Certificates for Cloud GPU Attestation

Faruk Alpay , Taylan Alpay This is my paper

Pith reviewed 2026-06-26 07:59 UTC · model grok-4.3

classification 💻 cs.CR cs.AR

keywords cloud GPU attestationlatency fingerprintCUDA probetopology certificateunprivileged attestationphysical fingerprintnetwork landmarksHBM sweep

0 comments

The pith

CUDA latency maps from ordinary code create certificates attesting cloud GPU identity, class, and coarse location without privileged access or vendor keys.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that ordinary CUDA code running on cloud GPUs can generate certificates attesting to the physical identity of the accelerator, its hardware topology class, and a coarse geographic location. These certificates rely on a measured SM-by-memory-region latency matrix that acts as a stable fingerprint with very low temporal variation, combined with HBM sweep data for topology and public network probes for location. A verifier can check the committed statistics and hashes without needing access to a GPU. If these measurements cannot be forged undetectably, cloud tenants gain a way to confirm they are using the claimed hardware rather than a substitute. This addresses the lack of direct inspection in cloud environments where only model name and region are provided.

Core claim

The paper claims that a software-only CUDA probe measures an SM-by-memory-region latency matrix using physical SM labels and dependent global loads. A streaming reducer commits sufficient statistics, configuration, code hashes, network evidence, and a compressed raw data archive into a certificate that a verifier can check without a GPU. This supports three claims: the per-SM latency map is a stable physical fingerprint with median temporal jitter of 0.09 cycles over a six-hour full-load RTX 5090 run and 100.0% shape-only leave-one-out classification accuracy for distinct Blackwell dies; cache-bypassing HBM sweeps recover hardware-class topology across generations including a unified Volta V

What carries the argument

The per-SM latency matrix measured via dependent global loads using physical SM labels, which serves as a stable physical fingerprint.

If this is right

The per-SM latency map remains stable with median temporal jitter of 0.09 cycles over six-hour full-load runs on RTX 5090.
Shape-only leave-one-out classification separates distinct Blackwell dies with 100.0% accuracy.
Cache-bypassing HBM sweeps recover hardware-class topology across generations, including specific cross-die penalties in Blackwell B200.
169 RIPE Atlas probes localize a B200 server within 44 km of its claimed datacentre and reject all 11 decoy sites.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the fingerprint is unforgeable, tenants could continuously monitor jobs to detect any runtime hardware substitution.
The approach might extend to continuous attestation during long-running workloads by re-measuring the latency matrix periodically.
Third-party auditors without GPU access could use the certificates to verify cloud provider claims on hardware class and location.
Similar per-core or per-unit latency patterns could be explored for attestation on non-GPU accelerators with hierarchical memory.

Load-bearing premise

The latency matrix and network landmarks measured by ordinary CUDA code cannot be forged or altered by the cloud provider or hypervisor without detectable changes to the reported statistics or hashes.

What would settle it

A hypervisor that intercepts CUDA calls, supplies a forged latency matrix and network responses matching the expected certificate hashes and statistics, yet runs on different hardware without producing detectable statistical deviations.

Figures

Figures reproduced from arXiv: 2606.24934 by Faruk Alpay, Taylan Alpay.

**Figure 1.** Figure 1: The attestation pipeline. The remote GPU produces raw timing rows; the artifact ships bounded summaries and certificates to verifiers, and the arXiv package carries the compressed raw data archive when it fits the source budget. 3 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: The on-chip network as a congestion-controlled fabric (RTX PRO 6000, 85 GiB resident). Achieved goodput (left axis) rises with offered concurrency and saturates at the dotted line, 1637 GB/s, past the knee marked by the dashed vertical line near 3008 concurrent warps; the effective per-line service time (right axis) falls from 11.6 to 0.08 ns as concurrency hides latency. The on-chip-network sweep supports… view at source ↗

read the original abstract

Cloud GPU tenants receive a model name and a region, but cannot directly inspect the physical accelerator that runs their job. We present a software-only attestation primitive for this setting. A CUDA probe measures an SM-by-memory-region latency matrix using physical SM labels and dependent global loads. A streaming reducer commits sufficient statistics, configuration, code hashes, network evidence, and a compressed raw data archive into a certificate that a verifier can check without a GPU. The certificate supports three claims. First, the per-SM latency map is a stable physical fingerprint. Over a six-hour full-load RTX 5090 run, its median temporal jitter is 0.09 cycles, while shape-only leave-one-out classification separates distinct Blackwell dies with 100.0% accuracy. Second, cache-bypassing HBM sweeps recover hardware-class topology across generations, including a unified Volta V100 memory domain, a two-way Hopper H200 L2 split, and a Blackwell B200 two-die NV-HBI package whose 74/74 SM partition carries a 30-cycle, 15.5 ns cross-die penalty. Third, public network landmarks bind the same certificate to a coarse location. In the B200 run, 169 RIPE Atlas probes place the server within 44 km of its claimed datacentre and reject all 11 decoy sites. Together, these measurements check cloud-GPU identity, class, and coarse location without privileged access or a vendor key.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives concrete unprivileged measurements for GPU fingerprinting and topology but leaves the hypervisor forgery question open.

read the letter

The main thing here is a working CUDA probe that builds an SM-by-memory latency matrix, folds it with code hashes and RIPE Atlas landmarks into a verifier-checkable certificate, and reports stable numbers: 0.09-cycle median jitter over six hours on an RTX 5090, 100% leave-one-out die separation on Blackwell, a measurable 30-cycle cross-die penalty on B200, and 44 km location resolution from 169 probes.

What stands out is the synthesis. Prior work has done latency side-channels or network geolocation separately; putting the per-SM map, cache-bypass topology sweep, and public landmark binding into one unprivileged artifact that a remote verifier can check without a GPU is the new piece. The cross-generation results (V100 single domain, H200 L2 split, B200 NV-HBI) also show the method is not tied to one architecture.

The soft spot is exactly the one the stress test flags. The certificate is only as good as the claim that a hypervisor cannot remap memory, inject delays, or virtualize the scheduler while keeping the reported hashes and statistics intact. The paper shows stability under normal load but supplies no adversarial runs, no hypervisor-controlled experiments, and no argument why the streaming reducer would catch such changes. That is the load-bearing assumption, and it is not yet demonstrated.

The work is aimed at people who run multi-tenant GPU fleets and need something better than vendor model strings for compliance or cost control. It is worth a serious referee because the measurements are specific and the primitive is implementable today; reviewers will focus on the security model rather than whether the basic technique works.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a software-only attestation primitive for cloud GPUs. A CUDA probe constructs an SM-by-memory-region latency matrix via physical SM labels and dependent global loads; a streaming reducer commits sufficient statistics, code hashes, configuration, network evidence, and a compressed archive into a verifiable certificate. The certificate supports three claims: (1) the per-SM latency map is a stable physical fingerprint (0.09-cycle median jitter over 6 h on RTX 5090; 100% shape-only leave-one-out classification on distinct Blackwell dies), (2) cache-bypassing HBM sweeps recover hardware-class topology across V100/H200/B200 generations (including a 30-cycle cross-die penalty on B200), and (3) RIPE Atlas landmarks bind the certificate to coarse location (169 probes place a B200 server within 44 km of its claimed datacentre and reject 11 decoys).

Significance. If the unforgeability and stability claims hold, the work supplies a practical, vendor-key-free method for tenants to verify GPU identity, class, and location in shared cloud environments. The empirical components—long-duration jitter measurements, cross-generation topology recovery, and public-network landmark binding—are concrete strengths that could be directly reused or extended.

major comments (2)

[Abstract, §3 (certificate construction), §5 (evaluation)] The central attestation claim (that the latency matrix and RIPE Atlas landmarks cannot be forged or altered by a hypervisor without detectable changes to hashes or statistics) is load-bearing yet unsupported by any adversarial evaluation. No section examines attacks such as memory-region remapping, scheduler virtualization, or controlled per-load delay injection that preserve the reported CUDA code hash, configuration, and streaming-reducer statistics.
[§4.1, §4.2] §4.1 and §4.2 report concrete stability figures (0.09-cycle median jitter, 100% leave-one-out accuracy, 30-cycle cross-die penalty) without error bars, full exclusion criteria, or statistical tests on the underlying distributions; this directly affects the reliability of the fingerprint and topology claims.

minor comments (2)

[Figures 3–5, Table 2] Figure captions and tables should explicitly state the number of independent runs and any filtering applied to the latency samples.
[§2.2] Notation for SM labels and memory regions is introduced without a consolidated glossary; a short table mapping labels to hardware units would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the attestation claims and the statistical presentation of results. We address each major comment below, indicating planned revisions.

read point-by-point responses

Referee: [Abstract, §3 (certificate construction), §5 (evaluation)] The central attestation claim (that the latency matrix and RIPE Atlas landmarks cannot be forged or altered by a hypervisor without detectable changes to hashes or statistics) is load-bearing yet unsupported by any adversarial evaluation. No section examines attacks such as memory-region remapping, scheduler virtualization, or controlled per-load delay injection that preserve the reported CUDA code hash, configuration, and streaming-reducer statistics.

Authors: We agree that the manuscript does not contain adversarial evaluations against hypervisor attacks such as memory remapping or delay injection. The work centers on the construction of the certificate from unprivileged CUDA measurements and its observed stability and topology properties under normal execution; the code hashes and reducer statistics are included to enable detection of gross tampering, but no claim of resistance to the specific attacks listed is supported by experiments. In revision we will add a limitations subsection to §5 that explicitly enumerates these attack vectors, clarifies the scope of the current empirical claims, and identifies them as directions for future adversarial analysis. revision: yes
Referee: [§4.1, §4.2] §4.1 and §4.2 report concrete stability figures (0.09-cycle median jitter, 100% leave-one-out accuracy, 30-cycle cross-die penalty) without error bars, full exclusion criteria, or statistical tests on the underlying distributions; this directly affects the reliability of the fingerprint and topology claims.

Authors: We accept that the reported figures would be strengthened by additional statistical detail. In the revised manuscript we will update §4.1 and §4.2 to include error bars (standard deviation and interquartile range) on the jitter and cross-die penalty measurements, provide an explicit account of sample exclusion criteria, and add statistical tests (e.g., two-sample Kolmogorov-Smirnov tests on latency distributions and bootstrap confidence intervals on classification accuracy) to support the leave-one-out results and generational topology distinctions. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct empirical measurements

full rationale

The paper presents empirical observations from CUDA probes (SM-by-memory latency matrices, temporal jitter of 0.09 cycles, leave-one-out classification accuracy, HBM topology sweeps, and RIPE Atlas network landmarks) without any derivation chain, equations, or first-principles predictions. No step reduces a claimed result to fitted parameters, self-definitions, or self-citations by construction; the reported fingerprints and topology recoveries are direct measurements rather than quantities defined by the same data. The central claims are therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on empirical stability of latency patterns rather than formal axioms or derivations; no free parameters are explicitly fitted in the abstract, though classification thresholds are implicit.

axioms (1)

domain assumption Latency matrix measured by user-level CUDA code reflects stable physical properties of the GPU die and memory system.
Invoked to support the fingerprint and topology claims.

pith-pipeline@v0.9.1-grok · 5785 in / 1332 out tokens · 14344 ms · 2026-06-26T07:59:13.128694+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 13 canonical work pages · 2 internal anchors

[1]

Remote ATtestation procedureS (RATS) Architecture

Henk Birkholz, Dave Thaler, Michael Richardson, Ned Smith, and Wei Pan. Remote ATtestation procedureS (RATS) Architecture. RFC 9334, Internet Engineering Task Force, 2023

2023
[2]

Blueprint, Bootstrap, and Bridge: A Security Look at NVIDIA GPU Confidential Computing

Zhongshu Gu, Enriquillo Valdez, Salman Ahmed, Julian James Stephen, Michael Le, Hani Jamjoom, Shixuan Zhao, and Zhiqiang Lin. NVIDIA GPU Confidential Computing Demystified.arXiv preprint arXiv:2507.02770, 2025. doi: 10.48550/arXiv.2507.02770

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.02770 2025
[3]

Validation of GPU Computation in Decentralized, Trustless Networks.arXiv preprint arXiv:2501.05374, 2025

Eric Boniardi, Stanley Bishop, and Alison Haire. Validation of GPU Computation in Decentralized, Trustless Networks.arXiv preprint arXiv:2501.05374, 2025. doi: 10.48550 /arXiv.2501.05374

arXiv 2025
[4]

Towards Verifiable Network Telemetry without Special Purpose Hardware

Jaechan An, Zeying Zhu, Ian Miers, and Zaoxing Liu. Towards Verifiable Network Telemetry without Special Purpose Hardware. InProceedings of the 24th ACM Workshop on Hot Topics in Networks (HotNets), 2025. doi: 10.1145/3772356.3772392

work page doi:10.1145/3772356.3772392 2025
[5]

Dissecting GPU memory hierarchy through microbench- marking.IEEE Transactions on Parallel and Distributed Systems, 28(1):72–86, 2017

Xinxin Mei and Xiaowen Chu. Dissecting GPU memory hierarchy through microbench- marking.IEEE Transactions on Parallel and Distributed Systems, 28(1):72–86, 2017. doi: 10.1109/TPDS.2016.2549523

work page doi:10.1109/tpds.2016.2549523 2017
[6]

Scarpazza

Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza. Dissecting the NVIDIA volta GPU architecture via microbenchmarking.arXiv preprint arXiv:1804.06826,

Pith/arXiv arXiv
[7]

doi: 10.48550/arXiv.1804.06826

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1804.06826
[8]

Aamodt, and John Kim

Zhixian Jin, Christopher Rocca, Jiho Kim, Hans Kasan, Minsoo Rhu, Ali Bakhoda, Tor M. Aamodt, and John Kim. Uncovering real GPU NoC characteristics: Implications on interconnect architecture. InProceedings of the 57th Annual IEEE/ACM International Symposium on Microarchitecture, pages 885–898, 2024. doi: 10.1109/MICRO61859.2024. 00070

work page doi:10.1109/micro61859.2024 2024
[9]

Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis.arXiv preprint arXiv:2512.02189, 2025

Aaron Jarmusch and Sunita Chandrasekaran. Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis.arXiv preprint arXiv:2512.02189, 2025. doi: 10.48550/arXiv.2512.02189

work page doi:10.48550/arxiv.2512.02189 2025
[10]

Rendered insecure: GPU side channel attacks are practical

Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian, and Nael Abu-Ghazaleh. Rendered insecure: GPU side channel attacks are practical. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 2139–2153, 2018. doi: 10.1145/3243734.3243831

work page doi:10.1145/3243734.3243831 2018
[11]

Spy in the GPU-box: Covert and side channel attacks on multi-GPU systems

Sankha Baran Dutta, Hoda Naghibijouybari, Arjun Gupta, Nael Abu-Ghazaleh, Andres Marquez, and Kevin Barker. Spy in the GPU-box: Covert and side channel attacks on multi-GPU systems. InProceedings of the 50th Annual International Symposium on Computer Architecture, 2023. doi: 10.1145/3579371.3589080

work page doi:10.1145/3579371.3589080 2023
[12]

NVBleed: Covertandside-channelattacksonNVIDIAmulti-GPU interconnect.arXiv preprint arXiv:2503.17847, 2025

Yicheng Zhang, Ravan Nazaraliyev, Sankha Baran Dutta, Andres Marquez, Kevin Barker, andNaelAbu-Ghazaleh. NVBleed: Covertandside-channelattacksonNVIDIAmulti-GPU interconnect.arXiv preprint arXiv:2503.17847, 2025. doi: 10.48550/arXiv.2503.17847

work page doi:10.48550/arxiv.2503.17847 2025
[13]

Snoeren, and kc claffy

Ben Du, Massimo Candela, Bradley Huffaker, Alex C. Snoeren, and kc claffy. RIPE IPmap Active Geolocation: Mechanism and Performance Evaluation. InACM SIGCOMM Computer Communication Review, volume 50, pages 3–10, 2020. doi: 10.1145/3402413.34 02415. 11

work page doi:10.1145/3402413.34 2020
[14]

Dude, where’s that IP? circumventing measurement-based IP geolocation

Phillipa Gill, Yashar Ganjali, Bernard Wong, and David Lie. Dude, where’s that IP? circumventing measurement-based IP geolocation. InProceedings of the 19th USENIX Security Symposium, 2010

2010
[15]

Trust, But Verify, Operator-Reported Geolocation.arXiv preprint arXiv:2409.19109, 2024

Katherine Izhikevich, Ben Du, Sumanth Rao, Alisha Ukani, and Liz Izhikevich. Trust, But Verify, Operator-Reported Geolocation.arXiv preprint arXiv:2409.19109, 2024. doi: 10.48550/arXiv.2409.19109

work page doi:10.48550/arxiv.2409.19109 2024
[16]

Parallel Thread Execution ISA.https://docs.nvidia.com/cuda /parallel-thread-execution/, 2026

NVIDIA Corporation. Parallel Thread Execution ISA.https://docs.nvidia.com/cuda /parallel-thread-execution/, 2026. Accessed 2026-06-21

2026
[17]

How does Cloudflare’s Speed Test really work?https://blog.cloudflare

Cloudflare. How does Cloudflare’s Speed Test really work?https://blog.cloudflare. com/how-does-cloudflares-speed-test-really-work/, 2025. Accessed 2026-06-21

2025
[18]

ndt7 Protocol.https://www.measurementlab.net/tests/ndt/ndt7/,

Measurement Lab. ndt7 Protocol.https://www.measurementlab.net/tests/ndt/ndt7/,
[19]

RIPE Atlas REST API: Measurements.https://atlas.ripe.net/docs/ap is/rest-api-reference/measurements/, 2026

RIPE NCC. RIPE Atlas REST API: Measurements.https://atlas.ripe.net/docs/ap is/rest-api-reference/measurements/, 2026. Accessed 2026-06-21

2026
[20]

Secure, Governable Chips: Using On-Chip Mechanisms to Manage National Security Risks from AI and Advanced Computing

Tim Fist and Erich Grunewald. Secure, Governable Chips: Using On-Chip Mechanisms to Manage National Security Risks from AI and Advanced Computing. Center for a New American Security (CNAS) report, 2023. Accessed 2026-06-22

2023
[21]

Location Verification for AI Chips.https://www.ia ps.ai/research/location-verification-for-ai-chips, 2025

Institute for AI Policy and Strategy. Location Verification for AI Chips.https://www.ia ps.ai/research/location-verification-for-ai-chips, 2025. Accessed 2026-06-22

2025
[22]

Mechanisms to Verify International Agreements About AI Development.arXiv preprint arXiv:2506.15867, 2025

Aaron Scher and Lisa Thiergart. Mechanisms to Verify International Agreements About AI Development.arXiv preprint arXiv:2506.15867, 2025. doi: 10.48550/arXiv.2506.15867

work page doi:10.48550/arxiv.2506.15867 2025
[23]

Distance-bounding protocols

Stefan Brands and David Chaum. Distance-bounding protocols. InAdvances in Cryptology — EUROCRYPT ’93, volume 765 ofLNCS, pages 344–359. Springer, 1994. doi: 10.1007/ 3-540-48285-7_30

1994
[24]

Understanding GPU resource interference one level deeper

Paul Elvinger, Foteini Strati, Natalie Enright Jerger, and Ana Klimovic. Understanding GPU resource interference one level deeper. InProceedings of the 2025 ACM Symposium on Cloud Computing (SoCC), 2025. doi: 10.1145/3772052.3772270

work page doi:10.1145/3772052.3772270 2025
[25]

Policies for Format Requirements.https://info.arxiv.org/help/policies/f ormat_requirements.html, 2026

arXiv. Policies for Format Requirements.https://info.arxiv.org/help/policies/f ormat_requirements.html, 2026. Accessed 2026-06-21

2026
[26]

Oversized Submissions

arXiv. Oversized Submissions. https://info.arxiv.org/help/sizes.html , 2026. Accessed 2026-06-21

2026
[27]

Ancillary Files (data, code, images).https://info.arxiv.org/help/ancillary_ files.html, 2026

arXiv. Ancillary Files (data, code, images).https://info.arxiv.org/help/ancillary_ files.html, 2026. Accessed 2026-06-21

2026
[28]

Support for data sets associated with arXiv articles.https://info.arxiv.org/h elp/datasets.html, 2026

arXiv. Support for data sets associated with arXiv articles.https://info.arxiv.org/h elp/datasets.html, 2026. Accessed 2026-06-21. 12

2026

[1] [1]

Remote ATtestation procedureS (RATS) Architecture

Henk Birkholz, Dave Thaler, Michael Richardson, Ned Smith, and Wei Pan. Remote ATtestation procedureS (RATS) Architecture. RFC 9334, Internet Engineering Task Force, 2023

2023

[2] [2]

Blueprint, Bootstrap, and Bridge: A Security Look at NVIDIA GPU Confidential Computing

Zhongshu Gu, Enriquillo Valdez, Salman Ahmed, Julian James Stephen, Michael Le, Hani Jamjoom, Shixuan Zhao, and Zhiqiang Lin. NVIDIA GPU Confidential Computing Demystified.arXiv preprint arXiv:2507.02770, 2025. doi: 10.48550/arXiv.2507.02770

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.02770 2025

[3] [3]

Validation of GPU Computation in Decentralized, Trustless Networks.arXiv preprint arXiv:2501.05374, 2025

Eric Boniardi, Stanley Bishop, and Alison Haire. Validation of GPU Computation in Decentralized, Trustless Networks.arXiv preprint arXiv:2501.05374, 2025. doi: 10.48550 /arXiv.2501.05374

arXiv 2025

[4] [4]

Towards Verifiable Network Telemetry without Special Purpose Hardware

Jaechan An, Zeying Zhu, Ian Miers, and Zaoxing Liu. Towards Verifiable Network Telemetry without Special Purpose Hardware. InProceedings of the 24th ACM Workshop on Hot Topics in Networks (HotNets), 2025. doi: 10.1145/3772356.3772392

work page doi:10.1145/3772356.3772392 2025

[5] [5]

Dissecting GPU memory hierarchy through microbench- marking.IEEE Transactions on Parallel and Distributed Systems, 28(1):72–86, 2017

Xinxin Mei and Xiaowen Chu. Dissecting GPU memory hierarchy through microbench- marking.IEEE Transactions on Parallel and Distributed Systems, 28(1):72–86, 2017. doi: 10.1109/TPDS.2016.2549523

work page doi:10.1109/tpds.2016.2549523 2017

[6] [6]

Scarpazza

Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza. Dissecting the NVIDIA volta GPU architecture via microbenchmarking.arXiv preprint arXiv:1804.06826,

Pith/arXiv arXiv

[7] [7]

doi: 10.48550/arXiv.1804.06826

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1804.06826

[8] [8]

Aamodt, and John Kim

Zhixian Jin, Christopher Rocca, Jiho Kim, Hans Kasan, Minsoo Rhu, Ali Bakhoda, Tor M. Aamodt, and John Kim. Uncovering real GPU NoC characteristics: Implications on interconnect architecture. InProceedings of the 57th Annual IEEE/ACM International Symposium on Microarchitecture, pages 885–898, 2024. doi: 10.1109/MICRO61859.2024. 00070

work page doi:10.1109/micro61859.2024 2024

[9] [9]

Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis.arXiv preprint arXiv:2512.02189, 2025

Aaron Jarmusch and Sunita Chandrasekaran. Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis.arXiv preprint arXiv:2512.02189, 2025. doi: 10.48550/arXiv.2512.02189

work page doi:10.48550/arxiv.2512.02189 2025

[10] [10]

Rendered insecure: GPU side channel attacks are practical

Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian, and Nael Abu-Ghazaleh. Rendered insecure: GPU side channel attacks are practical. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 2139–2153, 2018. doi: 10.1145/3243734.3243831

work page doi:10.1145/3243734.3243831 2018

[11] [11]

Spy in the GPU-box: Covert and side channel attacks on multi-GPU systems

Sankha Baran Dutta, Hoda Naghibijouybari, Arjun Gupta, Nael Abu-Ghazaleh, Andres Marquez, and Kevin Barker. Spy in the GPU-box: Covert and side channel attacks on multi-GPU systems. InProceedings of the 50th Annual International Symposium on Computer Architecture, 2023. doi: 10.1145/3579371.3589080

work page doi:10.1145/3579371.3589080 2023

[12] [12]

NVBleed: Covertandside-channelattacksonNVIDIAmulti-GPU interconnect.arXiv preprint arXiv:2503.17847, 2025

Yicheng Zhang, Ravan Nazaraliyev, Sankha Baran Dutta, Andres Marquez, Kevin Barker, andNaelAbu-Ghazaleh. NVBleed: Covertandside-channelattacksonNVIDIAmulti-GPU interconnect.arXiv preprint arXiv:2503.17847, 2025. doi: 10.48550/arXiv.2503.17847

work page doi:10.48550/arxiv.2503.17847 2025

[13] [13]

Snoeren, and kc claffy

Ben Du, Massimo Candela, Bradley Huffaker, Alex C. Snoeren, and kc claffy. RIPE IPmap Active Geolocation: Mechanism and Performance Evaluation. InACM SIGCOMM Computer Communication Review, volume 50, pages 3–10, 2020. doi: 10.1145/3402413.34 02415. 11

work page doi:10.1145/3402413.34 2020

[14] [14]

Dude, where’s that IP? circumventing measurement-based IP geolocation

Phillipa Gill, Yashar Ganjali, Bernard Wong, and David Lie. Dude, where’s that IP? circumventing measurement-based IP geolocation. InProceedings of the 19th USENIX Security Symposium, 2010

2010

[15] [15]

Trust, But Verify, Operator-Reported Geolocation.arXiv preprint arXiv:2409.19109, 2024

Katherine Izhikevich, Ben Du, Sumanth Rao, Alisha Ukani, and Liz Izhikevich. Trust, But Verify, Operator-Reported Geolocation.arXiv preprint arXiv:2409.19109, 2024. doi: 10.48550/arXiv.2409.19109

work page doi:10.48550/arxiv.2409.19109 2024

[16] [16]

Parallel Thread Execution ISA.https://docs.nvidia.com/cuda /parallel-thread-execution/, 2026

NVIDIA Corporation. Parallel Thread Execution ISA.https://docs.nvidia.com/cuda /parallel-thread-execution/, 2026. Accessed 2026-06-21

2026

[17] [17]

How does Cloudflare’s Speed Test really work?https://blog.cloudflare

Cloudflare. How does Cloudflare’s Speed Test really work?https://blog.cloudflare. com/how-does-cloudflares-speed-test-really-work/, 2025. Accessed 2026-06-21

2025

[18] [18]

ndt7 Protocol.https://www.measurementlab.net/tests/ndt/ndt7/,

Measurement Lab. ndt7 Protocol.https://www.measurementlab.net/tests/ndt/ndt7/,

[19] [19]

RIPE Atlas REST API: Measurements.https://atlas.ripe.net/docs/ap is/rest-api-reference/measurements/, 2026

RIPE NCC. RIPE Atlas REST API: Measurements.https://atlas.ripe.net/docs/ap is/rest-api-reference/measurements/, 2026. Accessed 2026-06-21

2026

[20] [20]

Secure, Governable Chips: Using On-Chip Mechanisms to Manage National Security Risks from AI and Advanced Computing

Tim Fist and Erich Grunewald. Secure, Governable Chips: Using On-Chip Mechanisms to Manage National Security Risks from AI and Advanced Computing. Center for a New American Security (CNAS) report, 2023. Accessed 2026-06-22

2023

[21] [21]

Location Verification for AI Chips.https://www.ia ps.ai/research/location-verification-for-ai-chips, 2025

Institute for AI Policy and Strategy. Location Verification for AI Chips.https://www.ia ps.ai/research/location-verification-for-ai-chips, 2025. Accessed 2026-06-22

2025

[22] [22]

Mechanisms to Verify International Agreements About AI Development.arXiv preprint arXiv:2506.15867, 2025

Aaron Scher and Lisa Thiergart. Mechanisms to Verify International Agreements About AI Development.arXiv preprint arXiv:2506.15867, 2025. doi: 10.48550/arXiv.2506.15867

work page doi:10.48550/arxiv.2506.15867 2025

[23] [23]

Distance-bounding protocols

Stefan Brands and David Chaum. Distance-bounding protocols. InAdvances in Cryptology — EUROCRYPT ’93, volume 765 ofLNCS, pages 344–359. Springer, 1994. doi: 10.1007/ 3-540-48285-7_30

1994

[24] [24]

Understanding GPU resource interference one level deeper

Paul Elvinger, Foteini Strati, Natalie Enright Jerger, and Ana Klimovic. Understanding GPU resource interference one level deeper. InProceedings of the 2025 ACM Symposium on Cloud Computing (SoCC), 2025. doi: 10.1145/3772052.3772270

work page doi:10.1145/3772052.3772270 2025

[25] [25]

Policies for Format Requirements.https://info.arxiv.org/help/policies/f ormat_requirements.html, 2026

arXiv. Policies for Format Requirements.https://info.arxiv.org/help/policies/f ormat_requirements.html, 2026. Accessed 2026-06-21

2026

[26] [26]

Oversized Submissions

arXiv. Oversized Submissions. https://info.arxiv.org/help/sizes.html , 2026. Accessed 2026-06-21

2026

[27] [27]

Ancillary Files (data, code, images).https://info.arxiv.org/help/ancillary_ files.html, 2026

arXiv. Ancillary Files (data, code, images).https://info.arxiv.org/help/ancillary_ files.html, 2026. Accessed 2026-06-21

2026

[28] [28]

Support for data sets associated with arXiv articles.https://info.arxiv.org/h elp/datasets.html, 2026

arXiv. Support for data sets associated with arXiv articles.https://info.arxiv.org/h elp/datasets.html, 2026. Accessed 2026-06-21. 12

2026