pith. sign in

arxiv: 2606.19262 · v1 · pith:BJDKXSVRnew · submitted 2026-06-17 · 💻 cs.LG

Detecting Hidden ML Training With Zero-Overhead Telemetry

Pith reviewed 2026-06-26 21:01 UTC · model grok-4.3

classification 💻 cs.LG
keywords GPU telemetryML workload detectionadversarial robustnesszero-overhead monitoringAI compute governanceNVML signalsworkload classification
0
0 comments X

The pith

Zero-overhead GPU telemetry identifies training workloads at 98.2 percent accuracy even after adversarial evasion attempts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that basic hardware signals from GPUs can distinguish machine learning training from other workloads without accessing any model details or data. This approach uses only existing zero-overhead telemetry that observes physical effects of computation. The authors test the resulting classifier against 20 families of evasion strategies over five rounds of attack and defense on nine different GPU models. If the results hold, monitoring schemes for AI compute governance become more feasible because they resist common attempts to hide training activity.

Core claim

A classifier built on content-agnostic NVML telemetry signals achieves 98.2 percent binary accuracy at identifying training workloads across the corpus and retains 43 to 87 percent accuracy against the hardest adversarially disguised unexpected workloads after evaluation of 20 evasion strategy families across 5 monitor-evader iterations on 9 GPU models spanning 4 architecture generations.

What carries the argument

The NVML telemetry classifier that maps zero-overhead, privacy-preserving hardware signals to workload labels without inspecting model weights, data, or hyperparameters.

If this is right

  • Governance systems can monitor large GPU fleets using only existing hardware interfaces without added cost or privacy risk.
  • Detection performance degrades but remains above random against repeated adversarial adaptation.
  • The same signals work across four generations of GPU architectures.
  • Unexpected workloads remain detectable even when operators attempt to disguise them as non-training tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar telemetry on other accelerators could support comparable detection if equivalent low-level counters exist.
  • Combining these signals with network or power data might raise accuracy further without changing the zero-overhead property.
  • Mandating telemetry access on shared clusters could shift the cost of evasion onto operators who want to hide activity.

Load-bearing premise

The 20 evasion strategy families evaluated across 5 monitor-evader iterations are assumed to cover the strongest practical attacks that could be mounted against the telemetry-based detector in real deployments.

What would settle it

A workload using an evasion technique outside the 20 families that drops the classifier accuracy below 43 percent on the most challenging test cases would falsify the robustness claim.

Figures

Figures reproduced from arXiv: 2606.19262 by Robi Rahman, Sabiha Tajdari.

Figure 1
Figure 1. Figure 1: Cross-GPU transfer heatmap (binary RF, 30 s windows). Cell values are accuracy when training on the row GPU and testing on the column GPU. Datacenter-to-datacenter transfer is consis￾tently above 90%. Cross-consumer and consumer↔datacenter transfer is weaker; RTX 4050/5080/5090 rows often drop below 50% on datacenter targets. The bottom row is LOGO: for each test GPU, the classifier is trained only on data… view at source ↗
Figure 2
Figure 2. Figure 2: H100-NVL roofline with the 286 corner-case inference configurations overlaid on a dense-FP32 training reference sweep. Color-coded by workload family (CNN inference, LLM prefill, LLM decode, quantisation variants, forward-only training variant, ViT inference). The CNN-inference, LLM-prefill, and quantisation clouds land squarely inside the training arithmetic-intensity band, confirming visually that static… view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy vs. window size. Finally, we only tested a limited range of workloads and adversarial strategies. Real datacenters run a wider variety of applications than we emulated, with notable gaps including multi-user inference serving, data processing and ETL pipelines, and predictive analytics. Although we tried to cover many evasion strategies and edge cases, real evaders might attempt a much wider range… view at source ↗
Figure 4
Figure 4. Figure 4: Detection rate in LOO evaluation by strategy family. Bar chart showing 20 families sorted by detection rate, with error bars showing min/max across variants within each family [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
read the original abstract

Hardware-enabled monitoring of GPU workloads underpins many proposals for AI compute governance, but if developers can defeat monitoring mechanisms, such schemes are unworkable. We evaluate the adversarial robustness of GPU workload classification using only zero-overhead, privacy-preserving NVML telemetry: content-agnostic signals that observe physical effects of computation without accessing model weights, training data, or hyperparameters. Across 5 rounds of monitor-evader iteration, we evaluate 20 evasion strategy families on 9 GPU models spanning 4 architecture generations. We develop a classifier that achieves 98.2% binary accuracy at identifying training workloads across the whole corpus, and 43-87% accuracy against the most challenging unexpected workloads even when they are adversarially disguised.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper evaluates the adversarial robustness of a GPU workload classifier that uses only zero-overhead, content-agnostic NVML telemetry (power, utilization, temperature, clock rates) to detect ML training workloads. It reports 98.2% binary accuracy on a corpus of workloads and 43-87% accuracy on the most challenging unexpected workloads after 5 rounds of monitor-evader iteration over 20 evasion strategy families evaluated on 9 GPU models across 4 architecture generations.

Significance. If the robustness results hold under stronger scrutiny of the evaluation methodology, the work would provide concrete evidence that privacy-preserving hardware telemetry can support AI compute governance proposals by resisting practical evasion attempts without requiring access to model weights or data.

major comments (2)
  1. [Abstract] Abstract: the headline performance claims (98.2% binary accuracy; 43-87% on adversarially disguised workloads) are presented without any description of dataset construction, cross-validation, statistical tests, or how the accuracy ranges were aggregated, rendering the central empirical result unverifiable from the provided text.
  2. [Abstract] Abstract: the robustness claim rests on the untested assumption that the 20 evaluated evasion strategy families (iterated 5 times) adequately sample the space of strongest practical attacks against NVML signals; no theoretical coverage argument, exhaustive enumeration, or architecture-specific analysis is supplied to bound the possibility of stronger untested attacks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline performance claims (98.2% binary accuracy; 43-87% on adversarially disguised workloads) are presented without any description of dataset construction, cross-validation, statistical tests, or how the accuracy ranges were aggregated, rendering the central empirical result unverifiable from the provided text.

    Authors: We agree the abstract omits these details for brevity. Dataset construction, cross-validation, and aggregation of the 43-87% range (derived from the final round of the 5-iteration process across 9 GPUs) are fully described in Sections 3 and 4. We will revise the abstract to add one sentence summarizing the corpus size, evaluation protocol, and range aggregation method. revision: yes

  2. Referee: [Abstract] Abstract: the robustness claim rests on the untested assumption that the 20 evaluated evasion strategy families (iterated 5 times) adequately sample the space of strongest practical attacks against NVML signals; no theoretical coverage argument, exhaustive enumeration, or architecture-specific analysis is supplied to bound the possibility of stronger untested attacks.

    Authors: The 20 families were selected to span the main practical categories observable in NVML (power, clock, thermal, utilization, and scheduling manipulations) and were stress-tested via 5 rounds of adaptive iteration. No theoretical coverage bound or exhaustive enumeration is provided, as the space of possible evasions is open-ended. We will add an explicit limitations paragraph acknowledging this and noting that stronger untested attacks remain possible. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation

full rationale

The paper reports measured classifier accuracies (98.2% binary, 43-87% on evasive workloads) obtained by training and testing on collected NVML telemetry traces across 9 GPUs and 20 explicitly enumerated evasion families. No equations, fitted parameters, or self-citations appear in the provided text that would reduce these reported numbers to definitions or inputs internal to the paper itself. The evaluation is self-contained against the concrete corpus and attack families that were actually run; any concern about whether those 20 families exhaust the space of possible attacks is a question of experimental coverage rather than circular reduction of the claimed results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no mathematical model, parameters, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5646 in / 941 out tokens · 23835 ms · 2026-06-26T21:01:54.476893+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 1 canonical work pages

  1. [7]

    Proceedings of the 42nd International Conference on Machine Learning (ICML) , year =

    O'Gara, Aidan and Kulp, Gabriel and Hodgkins, Will and Petrie, James and Immler, Vincent and Aysu, Aydin and Basu, Kanad and Bhasin, Shivam and Picek, Stjepan and Srivastava, Ankur , title =. Proceedings of the 42nd International Conference on Machine Learning (ICML) , year =

  2. [8]

    2025 , url =

    Aarne, Onni and Fist, Tim and Withers, Caleb , title =. 2025 , url =

  3. [9]

    2025 , url =

    Kulp, Gabriel and Siu, Daniel and Heim, Lennart , title =. 2025 , url =

  4. [10]

    2025 , note =

    Petrie, James , title =. 2025 , note =

  5. [12]

    2025 , url =

    Brass, Asher and Aarne, Onni , title =. 2025 , url =

  6. [13]

    2023 , url =

    Fist, Tim and Grunewald, Erich , title =. 2023 , url =

  7. [14]

    Working paper , year =

    Seferis, Emmanouil and Fist, Tim , title =. Working paper , year =

  8. [16]

    Proceedings of the Cray User Group (CUG) , year =

    Weakley, Le Mai and Michael, Scott and Thota, Abhinav and Huber, Laura and Fulton, Ben and Kusz, Matthew , title =. Proceedings of the Cray User Group (CUG) , year =

  9. [17]

    Proceedings of the ACM , year =

    Xu, Haoxuan and Gong, Chen and Liu, Beijie and Zheng, Haizhong and Chen, Beidi and Li, Mengyuan , title =. Proceedings of the ACM , year =

  10. [23]

    Working paper , year =

    Esposito, Giuseppe and Guerrero-Balaguera, Juan-David and Condia, Josie Esteban Rodriguez and Reorda, Matteo Sonza and Barbiero, Marco and Fortuna, Rossella , title =. Working paper , year =

  11. [24]

    31st USENIX Security Symposium (USENIX Security 22) , year =

    Maia, Henrique Teles and Xiao, Chang and Li, Dingzeyu and Grinspun, Eitan and Zheng, Changxi , title =. 31st USENIX Security Symposium (USENIX Security 22) , year =

  12. [25]

    Proceedings of the 15th ACM Conference on Data and Application Security and Privacy (CODASPY) , year =

    Arefin, Sayed Erfan and Serwadda, Abdul , title =. Proceedings of the 15th ACM Conference on Data and Application Security and Privacy (CODASPY) , year =

  13. [26]

    and Carbone, Matthew R

    Latif, Imran and Newkirk, Alex C. and Carbone, Matthew R. and Munir, Arslan and Lin, Yuewei and Koomey, Jonathan and Yu, Xi and Dong, Zhihua , title =. 2025 , url =

  14. [28]

    Proceedings of the 53rd International Symposium on Computer Architecture (ISCA) , year =

    Anonymous , title =. Proceedings of the 53rd International Symposium on Computer Architecture (ISCA) , year =

  15. [29]

    Machine Learning , volume =

    Breiman, Leo , title =. Machine Learning , volume =

  16. [30]

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , pages =

    Chen, Tianqi and Guestrin, Carlos , title =. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , pages =

  17. [31]

    Communications of the ACM , volume =

    Williams, Samuel and Waterman, Andrew and Patterson, David , title =. Communications of the ACM , volume =

  18. [33]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

    He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

  19. [34]

    Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT) , pages =

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , title =. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT) , pages =

  20. [35]

    Executive Order 14110: Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence , year =

  21. [36]

    2025 , url =

    You, Josh and Owen, David , title =. 2025 , url =

  22. [37]

    2025 , url =

    Data on. 2025 , url =

  23. [38]

    Secure, governable chips: Using on-chip mechanisms to manage national security risks from AI and advanced computing

    Aarne, O., Fist, T., and Withers, C. Secure, governable chips: Using on-chip mechanisms to manage national security risks from AI and advanced computing. Technical report, Center for a New American Security, 2025. URL https://www.cnas.org/publications/reports/secure-governable-chips

  24. [39]

    D., Ponomarev, D., and Abu-Ghazaleh, N

    Almusaddar, G., Zhang, Y., Ganjisaffar, S., Williams, B., Liu, Y. D., Ponomarev, D., and Abu-Ghazaleh, N. ShadowScope : Composable performance-counter monitoring for GPU computation integrity. arXiv preprint arXiv:2509.00300, 2025. URL https://arxiv.org/abs/2509.00300

  25. [40]

    Differential architecture: Hardware mechanisms to limit the performance of targeted applications

    Anonymous. Differential architecture: Hardware mechanisms to limit the performance of targeted applications. In Proceedings of the 53rd International Symposium on Computer Architecture (ISCA), 2026. Forthcoming

  26. [41]

    Arefin, S. E. and Serwadda, A. Exploiting HDMI and USB ports for GPU side-channel insights. In Proceedings of the 15th ACM Conference on Data and Application Security and Privacy (CODASPY), 2025. arXiv:2410.02539

  27. [42]

    Verifying international agreements on AI : Six layers of verification for rules on large-scale AI development and deployment

    Baker, M., Kulp, G., Marks, O., Brundage, M., and Heim, L. Verifying international agreements on AI : Six layers of verification for rules on large-scale AI development and deployment. arXiv preprint arXiv:2507.15916, 2025. URL https://arxiv.org/abs/2507.15916

  28. [43]

    and Aarne, O

    Brass, A. and Aarne, O. Location verification for AI chips. Technical report, Institute for AI Policy and Strategy, 2025. URL https://www.iaps.ai/research/location-verification-for-ai-chips

  29. [44]

    Random forests

    Breiman, L. Random forests. Machine Learning, 45 0 (1): 0 5--32, 2001

  30. [45]

    and Guestrin, C

    Chen, T. and Guestrin, C. XGBoost : A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp.\ 785--794, 2016

  31. [46]

    Electricity demand and grid impacts of AI data centers: Challenges and prospects

    Chen, X., Wang, X., Colacelli, A., Lee, M., and Xie, L. Electricity demand and grid impacts of AI data centers: Challenges and prospects. arXiv preprint arXiv:2509.07218, 2025 a . URL https://arxiv.org/abs/2509.07218

  32. [47]

    Chen, Z., Chien, S. W. D., Qian, P., and Zilberman, N. Hardware-centric, workload-agnostic anomaly detection for GPU computing. arXiv preprint arXiv:2510.26008, 2025 b . URL https://arxiv.org/abs/2510.26008

  33. [48]

    BERT : Pre-training of deep bidirectional transformers for language understanding

    Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT : Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pp.\ 4171--4186, 2019

  34. [49]

    Data on AI chip sales, 2025

    Epoch AI . Data on AI chip sales, 2025. URL https://epoch.ai/data/ai-chip-sales

  35. [50]

    Esposito, G., Guerrero-Balaguera, J.-D., Condia, J. E. R., Reorda, M. S., Barbiero, M., and Fortuna, R. Estimating GPU stress and reliability across workloads from telemetry and performance counters. Working paper, 2025

  36. [51]

    and Grunewald, E

    Fist, T. and Grunewald, E. Preventing AI chip smuggling to C hina: A working paper. Technical report, Center for a New American Security, 2023. URL https://www.cnas.org/publications/reports/preventing-ai-chip-smuggling-to-china

  37. [52]

    G., Lain, G., and Conti, M

    Gangwal, A., Piazzetta, S. G., Lain, G., and Conti, M. Detecting covert cryptomining using HPC . arXiv preprint arXiv:1909.00268, 2019. URL https://arxiv.org/abs/1909.00268

  38. [53]

    Deep residual learning for image recognition

    He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 770--778, 2016

  39. [54]

    and Koessler, L

    Heim, L. and Koessler, L. Training compute thresholds: Features and functions in AI regulation. arXiv preprint arXiv:2405.10799, 2024. URL https://arxiv.org/abs/2405.10799

  40. [55]

    A., and Zilberman, N

    Heim, L., Fist, T., Egan, J., Huang, S., Zekany, S., Trager, R., Osborne, M. A., and Zilberman, N. Governing through the cloud: The intermediary role of compute providers in AI regulation. arXiv preprint arXiv:2403.08501, 2024. URL https://arxiv.org/abs/2403.08501

  41. [56]

    B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D

    Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020. URL https://arxiv.org/abs/2001.08361

  42. [57]

    Hardware-enabled governance mechanisms: Developing technical solutions to exempt items otherwise classified under export control classification numbers 3A090 and 4A090

    Kulp, G., Siu, D., and Heim, L. Hardware-enabled governance mechanisms: Developing technical solutions to exempt items otherwise classified under export control classification numbers 3A090 and 4A090 . Technical report, RAND Corporation, 2025. URL https://www.rand.org/pubs/working_papers/WRA3056-1.html

  43. [58]

    C., Carbone, M

    Latif, I., Newkirk, A. C., Carbone, M. R., Munir, A., Lin, Y., Koomey, J., Yu, X., and Dong, Z. Empirical power signatures of large-scale training on 8- GPU H100 nodes. Technical report, Office of Scientific and Technical Information (OSTI), 2025. URL https://www.osti.gov/biblio/2555926

  44. [59]

    T., Xiao, C., Li, D., Grinspun, E., and Zheng, C

    Maia, H. T., Xiao, C., Li, D., Grinspun, E., and Zheng, C. Can one hear the shape of a neural network? S nooping the GPU via magnetic side channel. In 31st USENIX Security Symposium (USENIX Security 22), pp.\ 4383--4400. USENIX Association, 2022

  45. [60]

    K., Ganji, F., Holcomb, D., and Tajik, S

    Monfared, S. K., Ganji, F., Holcomb, D., and Tajik, S. Detecting ML training under untrusted host and device via timing and memory observables. arXiv preprint arXiv:2602.09369, 2026. URL https://arxiv.org/abs/2602.09369

  46. [61]

    Hardware-enabled mechanisms for verifying responsible AI development

    O'Gara, A., Kulp, G., Hodgkins, W., Petrie, J., Immler, V., Aysu, A., Basu, K., Bhasin, S., Picek, S., and Srivastava, A. Hardware-enabled mechanisms for verifying responsible AI development. In Proceedings of the 42nd International Conference on Machine Learning (ICML), 2025. URL https://arxiv.org/abs/2505.03742. arXiv:2505.03742

  47. [62]

    POLCA : Power oversubscription in LLM cloud providers

    Patel, P., Choukse, E., Zhang, C., Goiri, \'I ., Warrier, B., Mahalingam, N., and Bianchini, R. POLCA : Power oversubscription in LLM cloud providers. arXiv preprint arXiv:2308.12908, 2023. URL https://arxiv.org/abs/2308.12908

  48. [63]

    Offline licensing: A proposal for authorizing AI chip operation

    Petrie, J. Offline licensing: A proposal for authorizing AI chip operation. Technical report, Institute for AI Policy and Strategy, 2025. Working paper

  49. [64]

    Flexible hardware-enabled guarantees for AI compute

    Petrie, J., Aarne, O., Ammann, N., and Dalrymple, D. Flexible hardware-enabled guarantees for AI compute. arXiv preprint arXiv:2506.03409, 2025. URL https://arxiv.org/abs/2506.03409

  50. [65]

    K., Ngo, R., Pilz, K., Gor, G., Bluemke, E., Shoker, S., Egan, J., Trager, R

    Sastry, G., Heim, L., Belfield, H., Anderljung, M., Brundage, M., Hazell, J., O'Keefe, C., Hadfield, G. K., Ngo, R., Pilz, K., Gor, G., Bluemke, E., Shoker, S., Egan, J., Trager, R. F., Avin, S., Weller, A., Bengio, Y., and Coyle, D. Computing power and the governance of artificial intelligence. arXiv preprint arXiv:2402.08797, 2024. URL https://arxiv.org...

  51. [66]

    An international agreement to prevent the premature creation of artificial superintelligence

    Scher, A., Abecassis, D., Barnett, P., and Abeyta, B. An international agreement to prevent the premature creation of artificial superintelligence. arXiv preprint arXiv:2511.10783, 2025. URL https://arxiv.org/abs/2511.10783

  52. [67]

    and Fist, T

    Seferis, E. and Fist, T. Detecting compute structuring: Distributing training runs across providers. Working paper, 2025. URL https://openreview.net/forum?id=qseqw1sWzz

  53. [68]

    What does it take to catch a C hinchilla? V erifying rules on large-scale neural network training via compute monitoring

    Shavit, Y. What does it take to catch a C hinchilla? V erifying rules on large-scale neural network training via compute monitoring. arXiv preprint arXiv:2303.11341, 2023. URL https://arxiv.org/abs/2303.11341

  54. [69]

    J., Chen, Q., Weiss, M

    Tang, B. J., Chen, Q., Weiss, M. L., Frey, N., McDonald, J., Bestor, D., Yee, C., Arcand, W., Byun, C., Edelman, D., Hubbell, M., Jones, M., Kepner, J., Klein, A., Michaleas, A., Michaleas, P., Milechin, L., Mullen, J., Prout, A., Reuther, A., Rosa, A., Bowne, A., McEvoy, L., Li, B., Tiwari, D., Gadepally, V., and Samsi, S. The MIT S upercloud workload cl...

  55. [70]

    Executive order 14110: Safe, secure, and trustworthy development and use of artificial intelligence, 2023

    The White House . Executive order 14110: Safe, secure, and trustworthy development and use of artificial intelligence, 2023. URL https://www.federalregister.gov/documents/2023/11/01/2023-24283/. Federal Register 88 FR 75191

  56. [71]

    M., Michael, S., Thota, A., Huber, L., Fulton, B., and Kusz, M

    Weakley, L. M., Michael, S., Thota, A., Huber, L., Fulton, B., and Kusz, M. Classifying HPC and machine learning workloads from system telemetry. In Proceedings of the Cray User Group (CUG), 2025. URL https://cug.org/proceedings/cug2023_proceedings/cug2023_doi/papers_pages/pap139.html

  57. [72]

    Roofline: An insightful visual performance model for multicore architectures

    Williams, S., Waterman, A., and Patterson, D. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52 0 (4): 0 65--76, 2009

  58. [73]

    WAVE : Workload-aware verification via performance-counter evidence for GPU inference

    Xu, H., Gong, C., Liu, B., Zheng, H., Chen, B., and Li, M. WAVE : Workload-aware verification via performance-counter evidence for GPU inference. In Proceedings of the ACM, 2026. URL https://dl.acm.org/doi/10.1145/3779212.3790247. doi:10.1145/3779212.3790247

  59. [74]

    and Owen, D

    You, J. and Owen, D. Scaling intelligence: The exponential growth of AI 's power needs. Technical report, EPRI, 2025. URL https://www.epri.com/research/products/000000003002033669