pith. sign in

arxiv: 2605.16255 · v1 · pith:6EIIJK7Snew · submitted 2026-05-15 · 💻 cs.DC · cs.AI

Designing Datacenter Power Delivery Hierarchies for the AI Era

Pith reviewed 2026-05-19 18:19 UTC · model grok-4.3

classification 💻 cs.DC cs.AI
keywords datacenter power deliveryAI acceleratorspower strandingdeployable capacitypower densityoversubscriptiondeployment sequences
0
0 comments X

The pith

AI datacenters must prioritize deployable capacity over time instead of installed megawatts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework for assessing power delivery hierarchies in datacenters facing increasing AI accelerator densities. It evaluates designs by simulating sequences of deployments and measuring metrics like throughput, power usage, and costs based on production data. This approach reveals that stranding of power resources significantly impacts what capacity can actually be used. A reader would care because power from the grid is limited and designs need to support multiple hardware generations efficiently. The work shifts the focus to long-term deployable performance as the key objective.

Core claim

The central claim is that for AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time. The framework combines projection models for GPU, compute, and storage with operational factors to evaluate designs over realistic sequences, showing that multi-resource stranding changes deployable capacity, effective capital expenditure, and delivered performance.

What carries the argument

A framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences.

Load-bearing premise

The projection models for deployments and the operational factors from production data accurately represent the joint effects of topology, granularity, policy, oversubscription, and workload over changing sequences.

What would settle it

Measuring the actual deployable capacity and utilization in an AI datacenter over multiple years and comparing it to the framework's predictions would test the central claim.

Figures

Figures reproduced from arXiv: 2605.16255 by Alok Gautam Kumbhare, Chaojie Zhang, Fiodar Kazhamiaka, Grant Wilkins, Ricardo Bianchini.

Figure 1
Figure 1. Figure 1: P99 of rack power density since 2020 for datacenter deployments, showing distinct accelerator generations and a widening gap between GPU and non-GPU power density. Density is normalized to the maximum P99 value observed in each quarter at Azure. project rack- and pod-scale systems approaching 1 MW in a few years [34, 38, 55] [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Each marker represents a combination of datacen￾ter design, workload, and rack density projections. Colors represent different LLM mixture of experts models being served across the fleet. Each combination is analyzed with our framework, and is compared on throughput of LLM infer￾ence per watt versus effective fleet cost. Highlighted points represent diverse workloads, labels describe the points’ de￾sign an… view at source ↗
Figure 3
Figure 3. Figure 3: Example of major components in a datacenter power-delivery hierarchy, from grid and generator/battery down to the rack level. Generic diagram not showing redun￾dancy. Server PSU, though the exact ordering varies across facil￾ities [5, 19, 28, 39, 44, 58, 59]. We use line-up to refer to a common upstream electrical branch: a set of rows or racks that share the same upstream power-delivery equipment. Deploym… view at source ↗
Figure 5
Figure 5. Figure 5: CDF of UPS stranding under (a) single-hall Monte Carlo analysis and (b) the final state of an 8-year fleet-scale lifecycle simulation. The local view suggests that 4𝑁/3 and 3+1 are similar. The lifecycle simulation separates them: 3+1 develops higher tail stranding and requires additional halls to serve the same deployed demand. 3.1 A Tale of Two Designs Consider a 4𝑁/3 distributed-redundant hall (as shown… view at source ↗
Figure 6
Figure 6. Figure 6: Single-hall, single-SKU stranding under increas￾ing deployment power. Each experiment fills one hall with repeated deployments of the same SKU and reports the capac￾ity left undeployable at saturation. Distributed redundancy (4𝑁/3) strands capacity when too few parents have enough simultaneous failover headroom. Block redundancy (3+1) strands capacity at divisibility thresholds of the line-up or UPS-block … view at source ↗
Figure 7
Figure 7. Figure 7: Line-up-level stranding in Monte Carlo simulation of a 10𝑁/8 and 8 + 2 hall populated with storage, compute, and GPU racks under four online placement policies. Vari￾ance minimization yields the lowest stranding. adding a rack does not exceed effective capacity at any an￾cestor node, where effective capacity is the residual capacity available after enforcing redundancy constraints. Under distributed 𝑥𝑁/𝑦 r… view at source ↗
Figure 9
Figure 9. Figure 9: Validation of our simulator against historical rack placements in Azure over 6 years to a subset of both new and mature data halls, and comparing the simulated unused￾power distribution to the observed one. Unused power is normalized by the maximum observed value. We report un￾used rather than stranded power because some halls are not yet saturated. This mode is useful for identifying capacity harmonics an… view at source ↗
Figure 11
Figure 11. Figure 11: Normalized rack-power distributions for Azure general-compute and storage deployments since 2023. These results are clustered into empirical distributions of represen￾tative SKU groups for future trace generation. 5.2 Rack Resource Projections and Lifecycle Parameters Arrival envelopes determine how much capacity enters the fleet, yet it is necessary to specify how each arriving deploy￾ment is assigned ra… view at source ↗
Figure 12
Figure 12. Figure 12: Projected power-density trajectories for GPU racks, GPU pods, CPU compute racks, and storage racks [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Tail (P90) site stranding over time for block￾redundant (3+1, 8+2) and distributed (4𝑁/3, 10𝑁/8) designs under Low, Medium, and High GPU TDP projections. Lines show the median across pod compositions (3–7 racks); bands span the min–max range. Designs that appear similar under static capacity metrics separate once evaluated over the de￾ployment lifecycle. 10 12 14 16 Cost ($/W) Cost Source Base Reserve Str… view at source ↗
Figure 14
Figure 14. Figure 14: Incremental effective cost above each design’s base $/W. Bars decompose this excess into reserve cost and stranding-induced cost. Error bars show standard deviation across pod compositions. The main moving term is the cost of stranded capacity, not the nominal cost of reserve. reserve cost and stranding-induced cost. All designs begin with similar base costs, and reserve varies only modestly with redundan… view at source ↗
Figure 15
Figure 15. Figure 15: P90 tail stranding versus effective per-domain deployment power for 3+1 and 4𝑁/3 across all GPU TDP scenarios and pod compositions. Dashed vertical lines mark 2.5 MW UPS-block quantization thresholds, around which 3+1 exhibits pronounced stranding increases. the resulting lifecycle stranding follows the topology-specific mechanisms identified in Section 3.4 [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗
Figure 18
Figure 18. Figure 18: shows that the crossover depends on both work￾load and hierarchy. For smaller models, most communication is already contained within a rack-scale domain and pods have little to offer for serving throughput, but still incur a placement penalty, so payoff remains near zero or negative. As model size grows, more EP traffic spills across domains, and payoff becomes positive. The crossover is also topology-dep… view at source ↗
read the original abstract

Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. This poses a major challenge for datacenter power delivery designers. As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix. Moreover, each of these factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis. To address this challenge, we develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences. The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure. Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance, and quantify how rising density from rack- and pod-scale AI systems shapes these outcomes. For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript develops a framework for evaluating datacenter power delivery hierarchies under rising AI accelerator densities (projected to approach 1 MW per rack by 2027). It integrates projection models for GPU/compute/storage deployments with operational factors from Microsoft Azure production data to compute throughput, power, and cost metrics across realistic arrival, oversubscription, and decommissioning sequences. The central claim is that multi-resource stranding materially alters deployable capacity, effective capex, and delivered performance, so the relevant planning objective is deployable capacity over time rather than installed megawatts.

Significance. The result, if it holds, reframes a practical design problem for long-lived datacenters facing scarce grid capacity. Grounding the operational factors in production traces and handling joint evolution of topology, granularity, policy, and workload mix are genuine strengths that could influence how operators and architects prioritize power hierarchies. The absence of disclosed quantitative outputs, sensitivity results, or external validation in the provided description, however, limits the immediate weight of the conclusions.

major comments (2)
  1. The central claim that multi-resource stranding materially changes deployable capacity, capex, and performance rests on the projection models correctly encoding inter-dependencies among electrical topology, deployment granularity, placement policy, oversubscription, and workload mix across evolving sequences. The manuscript provides no sensitivity analysis to alternative arrival/decommissioning sequences or workload-mix assumptions, nor any comparison against external traces, leaving open the possibility that reported stranding effects are artifacts of the chosen Microsoft-Azure-derived sequences rather than a general property of power hierarchies.
  2. No validation details, error analysis, or robustness checks against variations in oversubscription factors and placement policies are supplied. Because these parameters directly affect the joint resource stranding calculations that drive the shift from installed MW to deployable capacity, their omission is load-bearing for the quantitative results.
minor comments (1)
  1. The abstract states high-level outcomes without any numerical values, confidence intervals, or table references; adding at least one concrete example (e.g., percentage change in deployable capacity for a 2027 density scenario) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for acknowledging the strengths of grounding the framework in production traces and jointly modeling evolving factors. We agree that additional sensitivity and validation details will strengthen the quantitative claims. We respond to each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: The central claim that multi-resource stranding materially changes deployable capacity, capex, and performance rests on the projection models correctly encoding inter-dependencies among electrical topology, deployment granularity, placement policy, oversubscription, and workload mix across evolving sequences. The manuscript provides no sensitivity analysis to alternative arrival/decommissioning sequences or workload-mix assumptions, nor any comparison against external traces, leaving open the possibility that reported stranding effects are artifacts of the chosen Microsoft-Azure-derived sequences rather than a general property of power hierarchies.

    Authors: We acknowledge that the submitted manuscript does not present explicit sensitivity analyses to alternative sequences or workload-mix assumptions, nor direct comparisons to external traces. Our sequences were selected to reflect realistic inter-dependencies observed in Microsoft Azure production data. To address this, we will add a dedicated sensitivity analysis subsection in the revised manuscript. This will include variations in arrival/decommissioning rates (e.g., faster/slower AI accelerator ramps) and workload mixes (e.g., increased storage-to-GPU ratios drawn from public industry reports). We will also add a discussion of generalizability, referencing publicly available datacenter utilization statistics from other operators to support that the stranding effects are characteristic of high-density AI deployments rather than artifacts of our specific traces. revision: yes

  2. Referee: No validation details, error analysis, or robustness checks against variations in oversubscription factors and placement policies are supplied. Because these parameters directly affect the joint resource stranding calculations that drive the shift from installed MW to deployable capacity, their omission is load-bearing for the quantitative results.

    Authors: We agree that the current manuscript lacks detailed validation, error analysis, and robustness checks on oversubscription and placement policies, which are central to the stranding calculations. The operational parameters are derived from Microsoft Azure traces, but these aspects were not expanded upon. In the revision we will augment the methods and evaluation sections with: (1) validation metrics comparing framework outputs to the source production data, including quantitative error bounds where the data permit; (2) robustness sweeps over oversubscription factors (1.1x–2.0x) and placement policies (e.g., random, power-aware, and affinity-based); and (3) explicit quantification of how these variations affect multi-resource stranding and the deployable-capacity metric. These additions will directly support the load-bearing quantitative results. revision: yes

Circularity Check

0 steps flagged

No circularity: results from external-data simulation framework

full rationale

The paper presents a simulation framework that evaluates power delivery hierarchies by combining projection models for GPU/compute/storage deployments with operational factors from Microsoft Azure production data. It reports outcomes on stranding, capacity, capex and performance across arrival/oversubscription/decommissioning sequences. No equations, derivations or self-citations are exhibited that reduce any claimed result to fitted inputs or prior author work by construction. The central claim (deployable capacity over time, not installed MW) follows from the framework outputs rather than being tautological with its inputs. This is a standard empirical systems study whose load-bearing elements rest on external traces and models.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Framework depends on projection models and Azure operational factors whose accuracy is assumed but not independently verified in the provided abstract.

axioms (1)
  • domain assumption Production data from Microsoft Azure is representative of general datacenter operations and inter-dependencies.
    Used to ground operational factors in the framework.

pith-pipeline@v0.9.0 · 5819 in / 1179 out tokens · 68136 ms · 2026-05-19T18:19:58.884570+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages

  1. [1]

    AccuTech Communications. 2024. Best Data Center Build Out Cost: Top 5 Key Factors.https://accutechcom.com/data-center-build-out- cost/States Tier III construction typically $7–$9M per MW (illustrative benchmark)

  2. [2]

    2019.Power Redundancy Schemes for Data Centers

    ASCO Power Technologies. 2019.Power Redundancy Schemes for Data Centers. Technical Report PS-WP-REDUNDANCY-DATA. Uptime Institute.https://www.se.com/sg/en/download/document/PS-WP- REDUNDANCY-DATA/

  3. [3]

    Tor- res Arango

    Victor Avelar, Patrick Donovan, Wendy Torell, and Maria A. Tor- res Arango. 2025.How 6 AI Attributes Change Data Center De- sign. Technical Report White Paper 110, v3. Schneider Electric. https://www.se.com/us/en/download/document/SPD_WP110_EN/

  4. [4]

    Hugo Barbalho, Patricia Kovaleski, Beibin Li, Luke Marshall, Marco Molinaro, Abhisek Pan, Eli Cortez, Matheus Leao, Harsh Patwari, Zuzu Tang, Larissa Rozales Gonçalves, David Dion, Thomas Mosci- broda, and Ishai Menache. 2023. Virtual Machine Allocation with Lifetime Predictions. InProceedings of Machine Learning and Systems, D. Song, M. Carbin, and T. Ch...

  5. [5]

    Scalable Funding of Bitcoin Micropayment Channel Networks

    Luiz Andr’e Barroso, Urs H"olzle, and Parthasarathy Ranganathan. 2019.The Datacenter as a Computer: Designing Warehouse-Scale Ma- chines(3 ed.). Springer, Cham. XVIII, 189 pages. doi:10.1007/978-3- 031-01761-2

  6. [6]

    Noman Bashir, Nan Deng, Krzysztof Rzadca, David Irwin, Sree Kodak, and Rohit Jnagal. 2021. Take it to the limit: peak prediction-driven re- source overcommitment in datacenters. InProceedings of the Sixteenth European Conference on Computer Systems(Online Event, United King- dom)(EuroSys ’21). Association for Computing Machinery, New York, NY, USA, 556–57...

  7. [7]

    Saumil Baxi, Kayla Cummings, Alexandre Jacquillat, Sean Lo, Rob McDonald, Konstantina Mellou, Ishai Menache, and Marco Molinaro

  8. [8]

    arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725

    Online Rack Placement in Large-Scale Data Centers: Online Sampling Optimization and Deployment. arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725

  9. [9]

    Ricardo Bianchini, Christian Belady, and Anand Sivasubramaniam

  10. [10]

    Datacenter power and energy management: past, present, and future.IEEE Micro(2024)

  11. [11]

    2019.Capital Cost Analysis of Immer- sive Liquid-Cooled vs

    Robert Bunger and Wendy Torell. 2019.Capital Cost Analysis of Immer- sive Liquid-Cooled vs. Air-Cooled Large Data Centres. Technical Report White Paper 282. Schneider Electric. Detailed CapEx comparison of 2MW datacenter configurations. Provides itemized infrastructure costs including generators, UPS, switchgear, and cooling subsystems

  12. [12]

    Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, and Mosharaf Chowdhury. 2024. Reducing Energy Bloat in Large Model Training. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles(Austin, TX, USA)(SOSP ’24). As- sociation for Computing Machinery, New York, NY, USA, 144–159. doi:10.1145/3694715.3695970

  13. [13]

    Cohen, Philipp Keller, Vahab Mirrokni, and Morteza Zadi- moghadddam

    Maxime C. Cohen, Philipp Keller, Vahab Mirrokni, and Morteza Zadi- moghadddam. 2017. Overcommitment in Cloud Services Bin packing with Chance Constraints. InProceedings of the 2017 ACM SIGMET- RICS / International Conference on Measurement and Modeling of Com- puter Systems(Urbana-Champaign, Illinois, USA)(SIGMETRICS ’17 Abstracts). Association for Comput...

  14. [14]

    doi:10.1145/3078505.3078530

  15. [15]

    Data Center Frontier. 2025. OCP Summit 2025 High- lights: Advancing Data Center Densification and Security. https://www.datacenterfrontier.com/design/article/55324586/ocp- summit-2025-highlights-advancing-data-center-densification-and- securityIndustry shift toward 800V DC power distribution for megawatt rack scales

  16. [16]

    Datacenters.com. 2025. Next-Gen Processors: Redefining Data Center Performance in 2025.https://www.datacenters.com/news/next-gen- processors-how-they-re-redefining-data-center-performanceHigh- performance processors pushing rack densities beyond 80 kW require liquid cooling

  17. [17]

    Dgtl Infra. 2024. How Much Does it Cost to Build a Data Cen- ter?https://dgtlinfra.com/how-much-does-it-cost-to-build-a-data- center/Component-level cost breakdowns for Tier III/IV facilities

  18. [18]

    Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, and Junchen Jiang. 2025. PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles(Lotte Hotel World, S...

  19. [19]

    Lisa Duignan. 2024. Data centre cost index 2024.https://www. turnerandtownsend.com/insights/data-centre-cost-index-2024/ Global construction cost benchmarks for data centers

  20. [20]

    Daniel Ellsworth, Tapasya Patki, Swann Perarnau, Sangmin Seo, Ab- delhalim Amer, Judicael Zounmevo, Rinku Gupta, Kazutomo Yoshii, Henry Hoffman, Allen Malony, Martin Schulz, and Pete Beckman

  21. [21]

    In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW)

    Systemwide Power Management with Argo. In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1118–1121. doi:10.1109/IPDPSW.2016.81

  22. [22]

    Marius Eriksen, Kaushik Veeraraghavan, Yusuf Abdulghani, Andrew Birchall, Po-Yen Chou, Richard Cornew, Adela Kabiljo, Ranjith Ku- mar S, Maroo Lieuw, Justin Meza, Scott Michelson, Thomas Rohloff, Hayley Russell, Jeff Qin, and Chunqiang Tang. 2023. Global Capacity 13 Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, and Ricardo Bianchi...

  23. [23]

    Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer.SIGARCH Com- put. Archit. News35, 2 (June 2007), 13–23. doi:10.1145/1273440.1250665

  24. [24]

    Daya Guo et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638. doi:10.1038/s41586-025-09422-z

  25. [25]

    Nishant Gupta, Iyswarya Narayanan, Shivam Handa, Sayak Chakraborti, Pankit Thapar, Baohua Shan, Ariel Rao, Yuanlai Liu, Pengyuan Wang, Yuqing Wu, Qingyi Gao, Chris Chao-Chun Cheng, Sihan You, Louis Huang, Jingyuan Fan, Kenny Yu, Kevin Lin, Tengfei Mu, Parth Malani, Haiying Wang, Trey Lu, and Peter Zhang. 2024. Dynamic Idle Resource Leasing To Safely Overs...

  26. [26]

    James Hamilton. 2009. Internet-scale service infrastructure efficiency. SIGARCH Comput. Archit. News37, 3 (June 2009), 232. doi:10.1145/ 1555815.1555756

  27. [27]

    2019.Electrical Distribution Equipment in Data Center En- vironments (White Paper 61, Rev

    Pearl Hu. 2019.Electrical Distribution Equipment in Data Center En- vironments (White Paper 61, Rev. 2). Technical Report. Schneider Electric.https://www.se.com/us/en/download/document/SPD_VAVR- 8W4MEX_EN/Equipment-level per-kW ranges for MV/LV switchgear, transformers, PDUs, panels

  28. [28]

    Tullsen, and Ta- jana Simunic Rosing

    Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Samp- son, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Ta- jana Simunic Rosing. 2012. Managing distributed UPS energy for effective power capping in data centers. In2012 39th Annual In- ternational Symposium on Computer Architecture (ISCA). 488–499. doi:10.1109/ISCA.2012.6237042

  29. [29]

    Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, et al . 2021. {Prediction-Based} power oversubscription in cloud platforms. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 473–487

  30. [30]

    Ming-Chi Kuo. 2025. NVIDIA AI Server Power Roadmap: Kyber’s Next-Generation Strategy from GPU/Rack-Level to Data-Center Scale.https://medium.com/@mingchikuo/nvidia-ai-server-power- roadmap-kybers-next-generation-strategy-from-gpu-rack-level-to- data-center-e380b459e183Industry analysis of NVIDIA’s reference design scope extending to entire data center

  31. [31]

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica

  32. [32]

    Efficient Memory Management for Large Language Model Serving with PagedAttention , booktitle =

    Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23). As- sociation for Computing Machinery, New York, NY, USA, 611–626. doi:10.1145/3600006.3613165

  33. [33]

    Lefurgy, Karthick Rajamani, Malcolm S

    Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S. Allen- Ware, Guillermo J. Silva, Daniel D. Heimsoth, Saugata Ghose, and Onur Mutlu. 2019. A Scalable Priority-Aware Approach to Managing Data Center Server Power. In2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 701–714. doi:10. 1109/HPCA.2019.00067

  34. [34]

    Rui Peng Liu, Konstantina Mellou, Evelyn Xiao-Yue Gong, Beibin Li, Thomas Coffee, Jeevan Pathuri, David Simchi-Levi, and Ishai Menache

  35. [35]

    Manufacturing & Service Operations Management27, 2 (2025), 425–440

    Efficient Cloud Server Deployment Under Demand Uncertainty. Manufacturing & Service Operations Management27, 2 (2025), 425–440. arXiv:https://doi.org/10.1287/msom.2023.0372 doi:10.1287/msom.2023. 0372

  36. [36]

    2016.Comparing UPS System Design Configurations

    Kevin McCarthy and Victor Avelar. 2016.Comparing UPS System Design Configurations. Technical Report White Paper 75. Schneider Electric – Data Center Science Center.https://download.schneider- electric.com/files?p_Doc_Ref=SPD_SADE-5TPL8X_ENRevision 4

  37. [37]

    John McWilliams, Ethan Tribble, Adrian Conforti, and Jason DOrlando

  38. [38]

    cld.bz/Data-Center-Development-Cost-Guide-2025Market-level to- tals and cost drivers across U.S

    Data Center Development Cost Guide 2025.https://cushwake. cld.bz/Data-Center-Development-Cost-Guide-2025Market-level to- tals and cost drivers across U.S. regions

  39. [39]

    Chris Mellor. 2025. Power Consumption and Datacenters. https://blocksandfiles.com/2025/07/14/power-consumption-and- data-centers/Dell’Oro analysis: AI workloads require 60-120 kW/rack for accelerated servers

  40. [40]

    Konstantina Mellou, Marco Molinaro, and Rudy Zhou. 2024. The Power of Migrations in Dynamic Bin Packing.Proc. ACM Meas. Anal. Comput. Syst.8, 3, Article 45 (Dec. 2024), 28 pages. doi:10.1145/3700435

  41. [41]

    Timothy Prickett Morgan. 2025. Nvidia Draws GPU System Roadmap Out To 2028.https://www.nextplatform.com/2025/03/19/ nvidia-draws-gpu-system-roadmap-out-to-2028/Rubin Ultra VR300 NVL576 consuming over 600 kilowatts, 21×performance of GB200

  42. [42]

    Christopher Muir, Luke Marshall, and Alejandro To- riello. 2024. Temporal Bin Packing with Half-Capacity Jobs.INFORMS Journal on Optimization6, 1 (2024), 46–62. arXiv:https://doi.org/10.1287/ijoo.2023.0002 doi:10.1287/ijoo.2023.0002

  43. [43]

    NVIDIA Corporation. 2025. Building the 800 VDC Ecosystem for Efficient, Scalable AI Factories.https://developer.nvidia.com/blog/ building-the-800-vdc-ecosystem-for-efficient-scalable-ai-factories Technical blog detailing 800V DC power distribution for megawatt rack scales

  44. [44]

    2023.OAI System Liquid Cool- ing Guidelines

    Open Compute Project. 2023.OAI System Liquid Cool- ing Guidelines. White Paper. Open Compute Project. https://www.opencompute.org/documents/oai-system-liquid- cooling-guidelines-in-ocp-template-mar-3-2023-update-pdf

  45. [45]

    Dylan Patel, Daniel Nishball, Kimbo Chen, Wega Chu, Ivan Chiam, and Cheang Kang Wen. 2025. Another Giant Leap: The Rubin CPX Specialized Accelerator & Rack.https://newsletter.semianalysis.com/ p/another-giant-leap-the-rubin-cpx-specialized-accelerator-rack

  46. [46]

    Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. 2024. Charac- terizing Power Management Opportunities for LLMs in the Cloud. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (La Jolla, CA, USA)(ASP...

  47. [47]

    Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, and Ricardo Bianchini. 2024. Splitwise: Efficient Generative LLM Inference Using Phase Splitting. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 118–132. doi:10.1109/ISCA59077.2024.00019

  48. [48]

    Leonardo Piga, Iyswarya Narayanan, Aditya Sundarrajan, Matt Skach, Qingyuan Deng, Biswadip Maity, Manoj Chakkaravarthy, Alison Huang, Abhishek Dhanotia, and Parth Malani. 2024. Expanding data- center capacity with dvfs boosting: A safe and scalable deployment experience. InProceedings of the 29th ACM International Conference on Architectural Support for P...

  49. [49]

    Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No "power" struggles: co- ordinated multi-level power management for the data center. InPro- ceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems(Seattle, WA, USA) (ASPLOS XIII). Association for...

  50. [50]

    Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yux- iong He. 2022. DeepSpeed-MoE: Advancing Mixture-of-Experts In- ference and Training to Power Next-Generation AI Scale.https: //proceedings.mlr.press/v162/rajbhandari22a.html

  51. [51]

    Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan

  52. [52]

    Data center power oversubscription with a medium voltage power plane and priority-aware capping,

    Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping. InProceedings of the Twenty- Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, USA, 497–511.https: //dl.acm.org/doi/abs/10.1145/3373376.3378533

  53. [53]

    Max Smolaks. 2023. Data center costs set to rise and rise.https://journal.uptimeinstitute.com/data-center-costs-set-to- rise-and-rise/Analysis of supply chain impacts on infrastructure costs

  54. [54]

    Jovan Stojkovic, Chaojie Zhang, Inigo Goiri, and Ricardo Bianchini

  55. [55]

    arXiv:2509.26534 [cs.AI]https://arxiv.org/abs/2509.26534

    Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework. arXiv:2509.26534 [cs.AI]https://arxiv.org/abs/2509.26534

  56. [56]

    Jovan Stojkovic, Chaojie Zhang, Inigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. 2025. TAPAS: Thermal- and Power-A ware Scheduling for LLM Inference in Cloud Platforms. Association for Computing Machinery, New York, NY, USA, 1266–1281.https://doi.org/10.1145/3676641.3716025

  57. [57]

    Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, and Esha Choukse. 2025. DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). 1348–

  58. [58]

    doi:10.1109/HPCA61900.2025.00102

  59. [59]

    The Register. 2023. Intel and AMD Just Created a Headache for Legacy Datacenters.https://www.theregister.com/2023/01/19/intel_ amd_uptime_cooling/AMD Epyc 4 at 400W and Intel Xeon Scalable at 350W TDP

  60. [60]

    Thunder Said Energy. 2024. Economic costs of data-centers?https: //thundersaidenergy.com/downloads/data-centers-the-economics/ Cost breakdown analysis including mechanical systems

  61. [61]

    Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, and Fabrizio Petrini. 2025. Scaling Intelligence: Designing Data Centers for Next- Gen Language Models. arXiv:2506.15006 [cs.AR]https://arxiv.org/ abs/2506.15006

  62. [62]

    Tom’s Hardware. 2025. Nvidia Announces Reference De- sign for Colossal Gigawatt-scale Omniverse DSX Data Cen- ters.https://www.tomshardware.com/tech-industry/artificial- intelligence/nvidia-announces-reference-design-for-gargantuan- gigawatt-scale-omniverse-dsx-data-centers-single-data-center- requires-a-nuclear-reactors-worth-of-power-generationNVIDIA’s ...

  63. [63]

    2016.Cost, Speed, and Reliability Tradeoffs be- tween N+1 UPS Configurations

    Wendy Torell. 2016.Cost, Speed, and Reliability Tradeoffs be- tween N+1 UPS Configurations. Technical Report White Paper

  64. [64]

    Schneider Electric – Data Center Science Center.https: //www.apc.com/us/en/support/resources-tools/white-papers/cost- speed-and-reliability-tradeoffs-between-n1-ups-configurations.jsp Revision 2

  65. [65]

    W Pitt Turner IV, JH PE, PE Seader, and KJ Brill. 2006. Tier classification define site infrastructure performance.Uptime Institute17 (2006)

  66. [66]

    Jarred Walton. 2025. Nvidia Announces Rubin GPUs in 2026, Rubin Ultra in 2027, Feynman Also Added to Roadmap. https://www.tomshardware.com/pc-components/gpus/nvidia- announces-rubin-gpus-in-2026-rubin-ultra-in-2027-feynam-after Rubin NVL144 specifications: 3.6 EFLOPS FP4, 288GB HBM4, 13 TB/s bandwidth

  67. [67]

    Jarred Walton. 2025. Nvidia Shows Off Rubin Ultra with 600,000-Watt Kyber Racks and Infrastructure, Coming in 2027. https://www.tomshardware.com/pc-components/gpus/nvidia- shows-off-rubin-ultra-with-600-000-watt-kyber-racks-and- infrastructure-coming-in-2027Kyber rack architecture targeting 600kW per rack with Rubin Ultra GPUs

  68. [68]

    Di Wang, Chuangang Ren, Anand Sivasubramaniam, Bhuvan Ur- gaonkar, and Hosam Fathy. 2012. Energy storage in datacenters: what, where, and how much?SIGMETRICS Perform. Eval. Rev.40, 1 (June 2012), 187–198. doi:10.1145/2318857.2254780

  69. [69]

    Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. 2016. Dy- namo: facebook’s data center-wide power management system. In Proceedings of the 43rd International Symposium on Computer Archi- tecture(Seoul, Republic of Korea)(ISCA ’16). IEEE Press, 469–480. doi:10.1109/ISCA.2016.48

  70. [70]

    Misra, Rod Assis, Kyle Woolcock, Nithish Ma- halingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, and Ricardo Bianchini

    Chaojie Zhang, Alok Gautam Kumbhare, Ioannis Manousakis, Deli Zhang, Pulkit A. Misra, Rod Assis, Kyle Woolcock, Nithish Ma- halingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, and Ricardo Bianchini

  71. [71]

    Cosa: Scheduling by constrained optimization for spatial accelerators,

    Flex: High-Availability Datacenters With Zero Reserved Power. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 319–332. doi:10.1109/ISCA52012.2021.00033

  72. [72]

    Hengrui Zhang, Pratyush Patel, August Ning, and David Wentzlaff

  73. [73]

    arXiv:2510.08544 [cs.AR]https://arxiv.org/abs/ 2510.08544

    SPAD: Specialized Prefill and Decode Hardware for Disaggre- gated LLM Inference. arXiv:2510.08544 [cs.AR]https://arxiv.org/abs/ 2510.08544

  74. [74]

    Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xu- anzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: disaggregating prefill and decoding for goodput-optimized large language model serv- ing. InProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation(Santa Clara, CA, USA)(OSDI’24). USENIX Association, USA, Art...

  75. [75]

    It is a first-order comparative model, not a topology- accurate runtime simulator

  76. [76]

    Communication is modeled with bandwidth-time ap- proximations rather than collective-specific kernels

  77. [77]

    A.5 Workload Model Suite Table 2 lists the model configurations used in the through- put study

    We do not model fine-grained overlap among TP com- munication, EP communication, and compute. A.5 Workload Model Suite Table 2 lists the model configurations used in the through- put study. The MoE suite spans three orders of magnitude in total parameters, from a 0.6 T model whose experts fit within a single rack-local NVLink domain to a 401 T model that ...

  78. [78]

    These trajectories define the non-GPU rack power inputs used by the SKU generation procedure

    Storage racks are anchored at 15 kW in 2025 and grow at {2%, 4%, 6%} annually, reaching {18, 22, 26} kW by 2034. These trajectories define the non-GPU rack power inputs used by the SKU generation procedure. Unless otherwise 17 Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, and Ricardo Bianchini Table 3.Deployment architecture param...