Designing Datacenter Power Delivery Hierarchies for the AI Era

Alok Gautam Kumbhare; Chaojie Zhang; Fiodar Kazhamiaka; Grant Wilkins; Ricardo Bianchini

arxiv: 2605.16255 · v1 · pith:6EIIJK7Snew · submitted 2026-05-15 · 💻 cs.DC · cs.AI

Designing Datacenter Power Delivery Hierarchies for the AI Era

Grant Wilkins , Fiodar Kazhamiaka , Alok Gautam Kumbhare , Chaojie Zhang , Ricardo Bianchini This is my paper

Pith reviewed 2026-05-19 18:19 UTC · model grok-4.3

classification 💻 cs.DC cs.AI

keywords datacenter power deliveryAI acceleratorspower strandingdeployable capacitypower densityoversubscriptiondeployment sequences

0 comments

The pith

AI datacenters must prioritize deployable capacity over time instead of installed megawatts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework for assessing power delivery hierarchies in datacenters facing increasing AI accelerator densities. It evaluates designs by simulating sequences of deployments and measuring metrics like throughput, power usage, and costs based on production data. This approach reveals that stranding of power resources significantly impacts what capacity can actually be used. A reader would care because power from the grid is limited and designs need to support multiple hardware generations efficiently. The work shifts the focus to long-term deployable performance as the key objective.

Core claim

The central claim is that for AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time. The framework combines projection models for GPU, compute, and storage with operational factors to evaluate designs over realistic sequences, showing that multi-resource stranding changes deployable capacity, effective capital expenditure, and delivered performance.

What carries the argument

A framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences.

Load-bearing premise

The projection models for deployments and the operational factors from production data accurately represent the joint effects of topology, granularity, policy, oversubscription, and workload over changing sequences.

What would settle it

Measuring the actual deployable capacity and utilization in an AI datacenter over multiple years and comparing it to the framework's predictions would test the central claim.

Figures

Figures reproduced from arXiv: 2605.16255 by Alok Gautam Kumbhare, Chaojie Zhang, Fiodar Kazhamiaka, Grant Wilkins, Ricardo Bianchini.

**Figure 1.** Figure 1: P99 of rack power density since 2020 for datacenter deployments, showing distinct accelerator generations and a widening gap between GPU and non-GPU power density. Density is normalized to the maximum P99 value observed in each quarter at Azure. project rack- and pod-scale systems approaching 1 MW in a few years [34, 38, 55] [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Each marker represents a combination of datacenter design, workload, and rack density projections. Colors represent different LLM mixture of experts models being served across the fleet. Each combination is analyzed with our framework, and is compared on throughput of LLM inference per watt versus effective fleet cost. Highlighted points represent diverse workloads, labels describe the points’ design an… view at source ↗

**Figure 3.** Figure 3: Example of major components in a datacenter power-delivery hierarchy, from grid and generator/battery down to the rack level. Generic diagram not showing redundancy. Server PSU, though the exact ordering varies across facilities [5, 19, 28, 39, 44, 58, 59]. We use line-up to refer to a common upstream electrical branch: a set of rows or racks that share the same upstream power-delivery equipment. Deploym… view at source ↗

**Figure 5.** Figure 5: CDF of UPS stranding under (a) single-hall Monte Carlo analysis and (b) the final state of an 8-year fleet-scale lifecycle simulation. The local view suggests that 4𝑁/3 and 3+1 are similar. The lifecycle simulation separates them: 3+1 develops higher tail stranding and requires additional halls to serve the same deployed demand. 3.1 A Tale of Two Designs Consider a 4𝑁/3 distributed-redundant hall (as shown… view at source ↗

**Figure 6.** Figure 6: Single-hall, single-SKU stranding under increasing deployment power. Each experiment fills one hall with repeated deployments of the same SKU and reports the capacity left undeployable at saturation. Distributed redundancy (4𝑁/3) strands capacity when too few parents have enough simultaneous failover headroom. Block redundancy (3+1) strands capacity at divisibility thresholds of the line-up or UPS-block … view at source ↗

**Figure 7.** Figure 7: Line-up-level stranding in Monte Carlo simulation of a 10𝑁/8 and 8 + 2 hall populated with storage, compute, and GPU racks under four online placement policies. Variance minimization yields the lowest stranding. adding a rack does not exceed effective capacity at any ancestor node, where effective capacity is the residual capacity available after enforcing redundancy constraints. Under distributed 𝑥𝑁/𝑦 r… view at source ↗

**Figure 9.** Figure 9: Validation of our simulator against historical rack placements in Azure over 6 years to a subset of both new and mature data halls, and comparing the simulated unusedpower distribution to the observed one. Unused power is normalized by the maximum observed value. We report unused rather than stranded power because some halls are not yet saturated. This mode is useful for identifying capacity harmonics an… view at source ↗

**Figure 11.** Figure 11: Normalized rack-power distributions for Azure general-compute and storage deployments since 2023. These results are clustered into empirical distributions of representative SKU groups for future trace generation. 5.2 Rack Resource Projections and Lifecycle Parameters Arrival envelopes determine how much capacity enters the fleet, yet it is necessary to specify how each arriving deployment is assigned ra… view at source ↗

**Figure 12.** Figure 12: Projected power-density trajectories for GPU racks, GPU pods, CPU compute racks, and storage racks [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗

**Figure 13.** Figure 13: Tail (P90) site stranding over time for blockredundant (3+1, 8+2) and distributed (4𝑁/3, 10𝑁/8) designs under Low, Medium, and High GPU TDP projections. Lines show the median across pod compositions (3–7 racks); bands span the min–max range. Designs that appear similar under static capacity metrics separate once evaluated over the deployment lifecycle. 10 12 14 16 Cost ($/W) Cost Source Base Reserve Str… view at source ↗

**Figure 14.** Figure 14: Incremental effective cost above each design’s base $/W. Bars decompose this excess into reserve cost and stranding-induced cost. Error bars show standard deviation across pod compositions. The main moving term is the cost of stranded capacity, not the nominal cost of reserve. reserve cost and stranding-induced cost. All designs begin with similar base costs, and reserve varies only modestly with redundan… view at source ↗

**Figure 15.** Figure 15: P90 tail stranding versus effective per-domain deployment power for 3+1 and 4𝑁/3 across all GPU TDP scenarios and pod compositions. Dashed vertical lines mark 2.5 MW UPS-block quantization thresholds, around which 3+1 exhibits pronounced stranding increases. the resulting lifecycle stranding follows the topology-specific mechanisms identified in Section 3.4 [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗

**Figure 18.** Figure 18: shows that the crossover depends on both workload and hierarchy. For smaller models, most communication is already contained within a rack-scale domain and pods have little to offer for serving throughput, but still incur a placement penalty, so payoff remains near zero or negative. As model size grows, more EP traffic spills across domains, and payoff becomes positive. The crossover is also topology-dep… view at source ↗

read the original abstract

Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. This poses a major challenge for datacenter power delivery designers. As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix. Moreover, each of these factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis. To address this challenge, we develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences. The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure. Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance, and quantify how rising density from rack- and pod-scale AI systems shapes these outcomes. For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper argues that AI datacenter power planning needs to target deployable capacity over time instead of installed megawatts, using a simulation framework that accounts for stranding across resources.

read the letter

The main thing here is that installed power stops being the right target once rack densities head toward 1 MW and hardware turns over every few years. Stranding across GPUs, compute, and storage means you can end up with a lot of provisioned capacity that never gets used, so the paper pushes for evaluating designs on what actually deploys and performs over realistic sequences. They built a framework that folds in electrical topology, deployment granularity, placement policy, oversubscription, and workload mix, then runs it against projection models tied to Azure production data. That combination lets them show how multi-resource stranding changes effective capacity, capex, and delivered performance in ways that simpler models miss. The 2027 density projections make the timing feel relevant given how tight grid power is getting. What they do well is treat the factors as jointly evolving rather than isolated, which matches how real deployments work. The soft spots are the reliance on proprietary single-operator models and the lack of visible sensitivity checks or external validation in the high-level description. If the arrival and decommissioning patterns or workload mixes are particular to their traces, the stranding effects could look different elsewhere. The abstract stays at the level of directional results without numbers or error analysis, so the strength of the claims rests on details that aren't shown here. This is for datacenter architects and systems researchers who deal with power-constrained AI infrastructure. Someone evaluating or building large facilities would get usable ideas from the framework even if they swap in their own parameters. It deserves peer review because the problem is current and the approach is practical, though any review would focus on robustness and how far the Azure grounding travels.

Referee Report

2 major / 1 minor

Summary. The manuscript develops a framework for evaluating datacenter power delivery hierarchies under rising AI accelerator densities (projected to approach 1 MW per rack by 2027). It integrates projection models for GPU/compute/storage deployments with operational factors from Microsoft Azure production data to compute throughput, power, and cost metrics across realistic arrival, oversubscription, and decommissioning sequences. The central claim is that multi-resource stranding materially alters deployable capacity, effective capex, and delivered performance, so the relevant planning objective is deployable capacity over time rather than installed megawatts.

Significance. The result, if it holds, reframes a practical design problem for long-lived datacenters facing scarce grid capacity. Grounding the operational factors in production traces and handling joint evolution of topology, granularity, policy, and workload mix are genuine strengths that could influence how operators and architects prioritize power hierarchies. The absence of disclosed quantitative outputs, sensitivity results, or external validation in the provided description, however, limits the immediate weight of the conclusions.

major comments (2)

The central claim that multi-resource stranding materially changes deployable capacity, capex, and performance rests on the projection models correctly encoding inter-dependencies among electrical topology, deployment granularity, placement policy, oversubscription, and workload mix across evolving sequences. The manuscript provides no sensitivity analysis to alternative arrival/decommissioning sequences or workload-mix assumptions, nor any comparison against external traces, leaving open the possibility that reported stranding effects are artifacts of the chosen Microsoft-Azure-derived sequences rather than a general property of power hierarchies.
No validation details, error analysis, or robustness checks against variations in oversubscription factors and placement policies are supplied. Because these parameters directly affect the joint resource stranding calculations that drive the shift from installed MW to deployable capacity, their omission is load-bearing for the quantitative results.

minor comments (1)

The abstract states high-level outcomes without any numerical values, confidence intervals, or table references; adding at least one concrete example (e.g., percentage change in deployable capacity for a 2027 density scenario) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for acknowledging the strengths of grounding the framework in production traces and jointly modeling evolving factors. We agree that additional sensitivity and validation details will strengthen the quantitative claims. We respond to each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: The central claim that multi-resource stranding materially changes deployable capacity, capex, and performance rests on the projection models correctly encoding inter-dependencies among electrical topology, deployment granularity, placement policy, oversubscription, and workload mix across evolving sequences. The manuscript provides no sensitivity analysis to alternative arrival/decommissioning sequences or workload-mix assumptions, nor any comparison against external traces, leaving open the possibility that reported stranding effects are artifacts of the chosen Microsoft-Azure-derived sequences rather than a general property of power hierarchies.

Authors: We acknowledge that the submitted manuscript does not present explicit sensitivity analyses to alternative sequences or workload-mix assumptions, nor direct comparisons to external traces. Our sequences were selected to reflect realistic inter-dependencies observed in Microsoft Azure production data. To address this, we will add a dedicated sensitivity analysis subsection in the revised manuscript. This will include variations in arrival/decommissioning rates (e.g., faster/slower AI accelerator ramps) and workload mixes (e.g., increased storage-to-GPU ratios drawn from public industry reports). We will also add a discussion of generalizability, referencing publicly available datacenter utilization statistics from other operators to support that the stranding effects are characteristic of high-density AI deployments rather than artifacts of our specific traces. revision: yes
Referee: No validation details, error analysis, or robustness checks against variations in oversubscription factors and placement policies are supplied. Because these parameters directly affect the joint resource stranding calculations that drive the shift from installed MW to deployable capacity, their omission is load-bearing for the quantitative results.

Authors: We agree that the current manuscript lacks detailed validation, error analysis, and robustness checks on oversubscription and placement policies, which are central to the stranding calculations. The operational parameters are derived from Microsoft Azure traces, but these aspects were not expanded upon. In the revision we will augment the methods and evaluation sections with: (1) validation metrics comparing framework outputs to the source production data, including quantitative error bounds where the data permit; (2) robustness sweeps over oversubscription factors (1.1x–2.0x) and placement policies (e.g., random, power-aware, and affinity-based); and (3) explicit quantification of how these variations affect multi-resource stranding and the deployable-capacity metric. These additions will directly support the load-bearing quantitative results. revision: yes

Circularity Check

0 steps flagged

No circularity: results from external-data simulation framework

full rationale

The paper presents a simulation framework that evaluates power delivery hierarchies by combining projection models for GPU/compute/storage deployments with operational factors from Microsoft Azure production data. It reports outcomes on stranding, capacity, capex and performance across arrival/oversubscription/decommissioning sequences. No equations, derivations or self-citations are exhibited that reduce any claimed result to fitted inputs or prior author work by construction. The central claim (deployable capacity over time, not installed MW) follows from the framework outputs rather than being tautological with its inputs. This is a standard empirical systems study whose load-bearing elements rest on external traces and models.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Framework depends on projection models and Azure operational factors whose accuracy is assumed but not independently verified in the provided abstract.

axioms (1)

domain assumption Production data from Microsoft Azure is representative of general datacenter operations and inter-dependencies.
Used to ground operational factors in the framework.

pith-pipeline@v0.9.0 · 5819 in / 1179 out tokens · 68136 ms · 2026-05-19T18:19:58.884570+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance... the relevant planning objective is not installed megawatts, but deployable capacity over time.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages

[1]

AccuTech Communications. 2024. Best Data Center Build Out Cost: Top 5 Key Factors.https://accutechcom.com/data-center-build-out- cost/States Tier III construction typically $7–$9M per MW (illustrative benchmark)

work page 2024
[2]

2019.Power Redundancy Schemes for Data Centers

ASCO Power Technologies. 2019.Power Redundancy Schemes for Data Centers. Technical Report PS-WP-REDUNDANCY-DATA. Uptime Institute.https://www.se.com/sg/en/download/document/PS-WP- REDUNDANCY-DATA/

work page 2019
[3]

Tor- res Arango

Victor Avelar, Patrick Donovan, Wendy Torell, and Maria A. Tor- res Arango. 2025.How 6 AI Attributes Change Data Center De- sign. Technical Report White Paper 110, v3. Schneider Electric. https://www.se.com/us/en/download/document/SPD_WP110_EN/

work page 2025
[4]

Hugo Barbalho, Patricia Kovaleski, Beibin Li, Luke Marshall, Marco Molinaro, Abhisek Pan, Eli Cortez, Matheus Leao, Harsh Patwari, Zuzu Tang, Larissa Rozales Gonçalves, David Dion, Thomas Mosci- broda, and Ishai Menache. 2023. Virtual Machine Allocation with Lifetime Predictions. InProceedings of Machine Learning and Systems, D. Song, M. Carbin, and T. Ch...

work page 2023
[5]

Scalable Funding of Bitcoin Micropayment Channel Networks

Luiz Andr’e Barroso, Urs H"olzle, and Parthasarathy Ranganathan. 2019.The Datacenter as a Computer: Designing Warehouse-Scale Ma- chines(3 ed.). Springer, Cham. XVIII, 189 pages. doi:10.1007/978-3- 031-01761-2

work page doi:10.1007/978-3- 2019
[6]

Noman Bashir, Nan Deng, Krzysztof Rzadca, David Irwin, Sree Kodak, and Rohit Jnagal. 2021. Take it to the limit: peak prediction-driven re- source overcommitment in datacenters. InProceedings of the Sixteenth European Conference on Computer Systems(Online Event, United King- dom)(EuroSys ’21). Association for Computing Machinery, New York, NY, USA, 556–57...

work page doi:10.1145/3447786.3456259 2021
[7]

Saumil Baxi, Kayla Cummings, Alexandre Jacquillat, Sean Lo, Rob McDonald, Konstantina Mellou, Ishai Menache, and Marco Molinaro

work page
[8]

arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725

Online Rack Placement in Large-Scale Data Centers: Online Sampling Optimization and Deployment. arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725

work page arXiv
[9]

Ricardo Bianchini, Christian Belady, and Anand Sivasubramaniam

work page
[10]

Datacenter power and energy management: past, present, and future.IEEE Micro(2024)

work page 2024
[11]

2019.Capital Cost Analysis of Immer- sive Liquid-Cooled vs

Robert Bunger and Wendy Torell. 2019.Capital Cost Analysis of Immer- sive Liquid-Cooled vs. Air-Cooled Large Data Centres. Technical Report White Paper 282. Schneider Electric. Detailed CapEx comparison of 2MW datacenter configurations. Provides itemized infrastructure costs including generators, UPS, switchgear, and cooling subsystems

work page 2019
[12]

Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, and Mosharaf Chowdhury. 2024. Reducing Energy Bloat in Large Model Training. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles(Austin, TX, USA)(SOSP ’24). As- sociation for Computing Machinery, New York, NY, USA, 144–159. doi:10.1145/3694715.3695970

work page doi:10.1145/3694715.3695970 2024
[13]

Cohen, Philipp Keller, Vahab Mirrokni, and Morteza Zadi- moghadddam

Maxime C. Cohen, Philipp Keller, Vahab Mirrokni, and Morteza Zadi- moghadddam. 2017. Overcommitment in Cloud Services Bin packing with Chance Constraints. InProceedings of the 2017 ACM SIGMET- RICS / International Conference on Measurement and Modeling of Com- puter Systems(Urbana-Champaign, Illinois, USA)(SIGMETRICS ’17 Abstracts). Association for Comput...

work page 2017
[14]

doi:10.1145/3078505.3078530

work page doi:10.1145/3078505.3078530
[15]

Data Center Frontier. 2025. OCP Summit 2025 High- lights: Advancing Data Center Densification and Security. https://www.datacenterfrontier.com/design/article/55324586/ocp- summit-2025-highlights-advancing-data-center-densification-and- securityIndustry shift toward 800V DC power distribution for megawatt rack scales

work page arXiv 2025
[16]

Datacenters.com. 2025. Next-Gen Processors: Redefining Data Center Performance in 2025.https://www.datacenters.com/news/next-gen- processors-how-they-re-redefining-data-center-performanceHigh- performance processors pushing rack densities beyond 80 kW require liquid cooling

work page 2025
[17]

Dgtl Infra. 2024. How Much Does it Cost to Build a Data Cen- ter?https://dgtlinfra.com/how-much-does-it-cost-to-build-a-data- center/Component-level cost breakdowns for Tier III/IV facilities

work page 2024
[18]

Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, and Junchen Jiang. 2025. PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles(Lotte Hotel World, S...

work page doi:10.1145/3731569.3764834 2025
[19]

Lisa Duignan. 2024. Data centre cost index 2024.https://www. turnerandtownsend.com/insights/data-centre-cost-index-2024/ Global construction cost benchmarks for data centers

work page 2024
[20]

Daniel Ellsworth, Tapasya Patki, Swann Perarnau, Sangmin Seo, Ab- delhalim Amer, Judicael Zounmevo, Rinku Gupta, Kazutomo Yoshii, Henry Hoffman, Allen Malony, Martin Schulz, and Pete Beckman

work page
[21]

In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Systemwide Power Management with Argo. In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1118–1121. doi:10.1109/IPDPSW.2016.81

work page doi:10.1109/ipdpsw.2016.81 2016
[22]

Marius Eriksen, Kaushik Veeraraghavan, Yusuf Abdulghani, Andrew Birchall, Po-Yen Chou, Richard Cornew, Adela Kabiljo, Ranjith Ku- mar S, Maroo Lieuw, Justin Meza, Scott Michelson, Thomas Rohloff, Hayley Russell, Jeff Qin, and Chunqiang Tang. 2023. Global Capacity 13 Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, and Ricardo Bianchi...

work page 2023
[23]

Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer.SIGARCH Com- put. Archit. News35, 2 (June 2007), 13–23. doi:10.1145/1273440.1250665

work page doi:10.1145/1273440.1250665 2007
[24]

Daya Guo et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638. doi:10.1038/s41586-025-09422-z

work page doi:10.1038/s41586-025-09422-z 2025
[25]

Nishant Gupta, Iyswarya Narayanan, Shivam Handa, Sayak Chakraborti, Pankit Thapar, Baohua Shan, Ariel Rao, Yuanlai Liu, Pengyuan Wang, Yuqing Wu, Qingyi Gao, Chris Chao-Chun Cheng, Sihan You, Louis Huang, Jingyuan Fan, Kenny Yu, Kevin Lin, Tengfei Mu, Parth Malani, Haiying Wang, Trey Lu, and Peter Zhang. 2024. Dynamic Idle Resource Leasing To Safely Overs...

work page doi:10.1145/3698038.3698537 2024
[26]

James Hamilton. 2009. Internet-scale service infrastructure efficiency. SIGARCH Comput. Archit. News37, 3 (June 2009), 232. doi:10.1145/ 1555815.1555756

work page arXiv 2009
[27]

2019.Electrical Distribution Equipment in Data Center En- vironments (White Paper 61, Rev

Pearl Hu. 2019.Electrical Distribution Equipment in Data Center En- vironments (White Paper 61, Rev. 2). Technical Report. Schneider Electric.https://www.se.com/us/en/download/document/SPD_VAVR- 8W4MEX_EN/Equipment-level per-kW ranges for MV/LV switchgear, transformers, PDUs, panels

work page 2019
[28]

Tullsen, and Ta- jana Simunic Rosing

Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Samp- son, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Ta- jana Simunic Rosing. 2012. Managing distributed UPS energy for effective power capping in data centers. In2012 39th Annual In- ternational Symposium on Computer Architecture (ISCA). 488–499. doi:10.1109/ISCA.2012.6237042

work page doi:10.1109/isca.2012.6237042 2012
[29]

Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, et al . 2021. {Prediction-Based} power oversubscription in cloud platforms. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 473–487

work page 2021
[30]

Ming-Chi Kuo. 2025. NVIDIA AI Server Power Roadmap: Kyber’s Next-Generation Strategy from GPU/Rack-Level to Data-Center Scale.https://medium.com/@mingchikuo/nvidia-ai-server-power- roadmap-kybers-next-generation-strategy-from-gpu-rack-level-to- data-center-e380b459e183Industry analysis of NVIDIA’s reference design scope extending to entire data center

work page 2025
[31]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica

work page
[32]

Efficient Memory Management for Large Language Model Serving with PagedAttention , booktitle =

Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23). As- sociation for Computing Machinery, New York, NY, USA, 611–626. doi:10.1145/3600006.3613165

work page doi:10.1145/3600006.3613165
[33]

Lefurgy, Karthick Rajamani, Malcolm S

Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S. Allen- Ware, Guillermo J. Silva, Daniel D. Heimsoth, Saugata Ghose, and Onur Mutlu. 2019. A Scalable Priority-Aware Approach to Managing Data Center Server Power. In2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 701–714. doi:10. 1109/HPCA.2019.00067

work page arXiv 2019
[34]

Rui Peng Liu, Konstantina Mellou, Evelyn Xiao-Yue Gong, Beibin Li, Thomas Coffee, Jeevan Pathuri, David Simchi-Levi, and Ishai Menache

work page
[35]

Manufacturing & Service Operations Management27, 2 (2025), 425–440

Efficient Cloud Server Deployment Under Demand Uncertainty. Manufacturing & Service Operations Management27, 2 (2025), 425–440. arXiv:https://doi.org/10.1287/msom.2023.0372 doi:10.1287/msom.2023. 0372

work page doi:10.1287/msom.2023.0372 2025
[36]

2016.Comparing UPS System Design Configurations

Kevin McCarthy and Victor Avelar. 2016.Comparing UPS System Design Configurations. Technical Report White Paper 75. Schneider Electric – Data Center Science Center.https://download.schneider- electric.com/files?p_Doc_Ref=SPD_SADE-5TPL8X_ENRevision 4

work page 2016
[37]

John McWilliams, Ethan Tribble, Adrian Conforti, and Jason DOrlando

work page
[38]

cld.bz/Data-Center-Development-Cost-Guide-2025Market-level to- tals and cost drivers across U.S

Data Center Development Cost Guide 2025.https://cushwake. cld.bz/Data-Center-Development-Cost-Guide-2025Market-level to- tals and cost drivers across U.S. regions

work page 2025
[39]

Chris Mellor. 2025. Power Consumption and Datacenters. https://blocksandfiles.com/2025/07/14/power-consumption-and- data-centers/Dell’Oro analysis: AI workloads require 60-120 kW/rack for accelerated servers

work page 2025
[40]

Konstantina Mellou, Marco Molinaro, and Rudy Zhou. 2024. The Power of Migrations in Dynamic Bin Packing.Proc. ACM Meas. Anal. Comput. Syst.8, 3, Article 45 (Dec. 2024), 28 pages. doi:10.1145/3700435

work page doi:10.1145/3700435 2024
[41]

Timothy Prickett Morgan. 2025. Nvidia Draws GPU System Roadmap Out To 2028.https://www.nextplatform.com/2025/03/19/ nvidia-draws-gpu-system-roadmap-out-to-2028/Rubin Ultra VR300 NVL576 consuming over 600 kilowatts, 21×performance of GB200

work page 2025
[42]

Christopher Muir, Luke Marshall, and Alejandro To- riello. 2024. Temporal Bin Packing with Half-Capacity Jobs.INFORMS Journal on Optimization6, 1 (2024), 46–62. arXiv:https://doi.org/10.1287/ijoo.2023.0002 doi:10.1287/ijoo.2023.0002

work page doi:10.1287/ijoo.2023.0002 2024
[43]

NVIDIA Corporation. 2025. Building the 800 VDC Ecosystem for Efficient, Scalable AI Factories.https://developer.nvidia.com/blog/ building-the-800-vdc-ecosystem-for-efficient-scalable-ai-factories Technical blog detailing 800V DC power distribution for megawatt rack scales

work page 2025
[44]

2023.OAI System Liquid Cool- ing Guidelines

Open Compute Project. 2023.OAI System Liquid Cool- ing Guidelines. White Paper. Open Compute Project. https://www.opencompute.org/documents/oai-system-liquid- cooling-guidelines-in-ocp-template-mar-3-2023-update-pdf

work page 2023
[45]

Dylan Patel, Daniel Nishball, Kimbo Chen, Wega Chu, Ivan Chiam, and Cheang Kang Wen. 2025. Another Giant Leap: The Rubin CPX Specialized Accelerator & Rack.https://newsletter.semianalysis.com/ p/another-giant-leap-the-rubin-cpx-specialized-accelerator-rack

work page 2025
[46]

Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. 2024. Charac- terizing Power Management Opportunities for LLMs in the Cloud. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (La Jolla, CA, USA)(ASP...

work page doi:10.1145/3620666.3651329 2024
[47]

Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, and Ricardo Bianchini. 2024. Splitwise: Efficient Generative LLM Inference Using Phase Splitting. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 118–132. doi:10.1109/ISCA59077.2024.00019

work page doi:10.1109/isca59077.2024.00019 2024
[48]

Leonardo Piga, Iyswarya Narayanan, Aditya Sundarrajan, Matt Skach, Qingyuan Deng, Biswadip Maity, Manoj Chakkaravarthy, Alison Huang, Abhishek Dhanotia, and Parth Malani. 2024. Expanding data- center capacity with dvfs boosting: A safe and scalable deployment experience. InProceedings of the 29th ACM International Conference on Architectural Support for P...

work page 2024
[49]

Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No "power" struggles: co- ordinated multi-level power management for the data center. InPro- ceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems(Seattle, WA, USA) (ASPLOS XIII). Association for...

work page doi:10.1145/1346281.1346289 2008
[50]

Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yux- iong He. 2022. DeepSpeed-MoE: Advancing Mixture-of-Experts In- ference and Training to Power Next-Generation AI Scale.https: //proceedings.mlr.press/v162/rajbhandari22a.html

work page 2022
[51]

Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan

work page
[52]

Data center power oversubscription with a medium voltage power plane and priority-aware capping,

Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping. InProceedings of the Twenty- Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, USA, 497–511.https: //dl.acm.org/doi/abs/10.1145/3373376.3378533

work page doi:10.1145/3373376.3378533
[53]

Max Smolaks. 2023. Data center costs set to rise and rise.https://journal.uptimeinstitute.com/data-center-costs-set-to- rise-and-rise/Analysis of supply chain impacts on infrastructure costs

work page 2023
[54]

Jovan Stojkovic, Chaojie Zhang, Inigo Goiri, and Ricardo Bianchini

work page
[55]

arXiv:2509.26534 [cs.AI]https://arxiv.org/abs/2509.26534

Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework. arXiv:2509.26534 [cs.AI]https://arxiv.org/abs/2509.26534

work page arXiv
[56]

Jovan Stojkovic, Chaojie Zhang, Inigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. 2025. TAPAS: Thermal- and Power-A ware Scheduling for LLM Inference in Cloud Platforms. Association for Computing Machinery, New York, NY, USA, 1266–1281.https://doi.org/10.1145/3676641.3716025

work page doi:10.1145/3676641.3716025 2025
[57]

Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, and Esha Choukse. 2025. DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). 1348–

work page 2025
[58]

doi:10.1109/HPCA61900.2025.00102

work page doi:10.1109/hpca61900.2025.00102 2025
[59]

The Register. 2023. Intel and AMD Just Created a Headache for Legacy Datacenters.https://www.theregister.com/2023/01/19/intel_ amd_uptime_cooling/AMD Epyc 4 at 400W and Intel Xeon Scalable at 350W TDP

work page 2023
[60]

Thunder Said Energy. 2024. Economic costs of data-centers?https: //thundersaidenergy.com/downloads/data-centers-the-economics/ Cost breakdown analysis including mechanical systems

work page 2024
[61]

Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, and Fabrizio Petrini. 2025. Scaling Intelligence: Designing Data Centers for Next- Gen Language Models. arXiv:2506.15006 [cs.AR]https://arxiv.org/ abs/2506.15006

work page arXiv 2025
[62]

Tom’s Hardware. 2025. Nvidia Announces Reference De- sign for Colossal Gigawatt-scale Omniverse DSX Data Cen- ters.https://www.tomshardware.com/tech-industry/artificial- intelligence/nvidia-announces-reference-design-for-gargantuan- gigawatt-scale-omniverse-dsx-data-centers-single-data-center- requires-a-nuclear-reactors-worth-of-power-generationNVIDIA’s ...

work page 2025
[63]

2016.Cost, Speed, and Reliability Tradeoffs be- tween N+1 UPS Configurations

Wendy Torell. 2016.Cost, Speed, and Reliability Tradeoffs be- tween N+1 UPS Configurations. Technical Report White Paper

work page 2016
[64]

Schneider Electric – Data Center Science Center.https: //www.apc.com/us/en/support/resources-tools/white-papers/cost- speed-and-reliability-tradeoffs-between-n1-ups-configurations.jsp Revision 2

work page
[65]

W Pitt Turner IV, JH PE, PE Seader, and KJ Brill. 2006. Tier classification define site infrastructure performance.Uptime Institute17 (2006)

work page 2006
[66]

Jarred Walton. 2025. Nvidia Announces Rubin GPUs in 2026, Rubin Ultra in 2027, Feynman Also Added to Roadmap. https://www.tomshardware.com/pc-components/gpus/nvidia- announces-rubin-gpus-in-2026-rubin-ultra-in-2027-feynam-after Rubin NVL144 specifications: 3.6 EFLOPS FP4, 288GB HBM4, 13 TB/s bandwidth

work page 2025
[67]

Jarred Walton. 2025. Nvidia Shows Off Rubin Ultra with 600,000-Watt Kyber Racks and Infrastructure, Coming in 2027. https://www.tomshardware.com/pc-components/gpus/nvidia- shows-off-rubin-ultra-with-600-000-watt-kyber-racks-and- infrastructure-coming-in-2027Kyber rack architecture targeting 600kW per rack with Rubin Ultra GPUs

work page 2025
[68]

Di Wang, Chuangang Ren, Anand Sivasubramaniam, Bhuvan Ur- gaonkar, and Hosam Fathy. 2012. Energy storage in datacenters: what, where, and how much?SIGMETRICS Perform. Eval. Rev.40, 1 (June 2012), 187–198. doi:10.1145/2318857.2254780

work page doi:10.1145/2318857.2254780 2012
[69]

Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. 2016. Dy- namo: facebook’s data center-wide power management system. In Proceedings of the 43rd International Symposium on Computer Archi- tecture(Seoul, Republic of Korea)(ISCA ’16). IEEE Press, 469–480. doi:10.1109/ISCA.2016.48

work page doi:10.1109/isca.2016.48 2016
[70]

Misra, Rod Assis, Kyle Woolcock, Nithish Ma- halingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, and Ricardo Bianchini

Chaojie Zhang, Alok Gautam Kumbhare, Ioannis Manousakis, Deli Zhang, Pulkit A. Misra, Rod Assis, Kyle Woolcock, Nithish Ma- halingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, and Ricardo Bianchini

work page
[71]

Cosa: Scheduling by constrained optimization for spatial accelerators,

Flex: High-Availability Datacenters With Zero Reserved Power. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 319–332. doi:10.1109/ISCA52012.2021.00033

work page doi:10.1109/isca52012.2021.00033 2021
[72]

Hengrui Zhang, Pratyush Patel, August Ning, and David Wentzlaff

work page
[73]

arXiv:2510.08544 [cs.AR]https://arxiv.org/abs/ 2510.08544

SPAD: Specialized Prefill and Decode Hardware for Disaggre- gated LLM Inference. arXiv:2510.08544 [cs.AR]https://arxiv.org/abs/ 2510.08544

work page arXiv
[74]

Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xu- anzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: disaggregating prefill and decoding for goodput-optimized large language model serv- ing. InProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation(Santa Clara, CA, USA)(OSDI’24). USENIX Association, USA, Art...

work page 2024
[75]

It is a first-order comparative model, not a topology- accurate runtime simulator

work page
[76]

Communication is modeled with bandwidth-time ap- proximations rather than collective-specific kernels

work page
[77]

A.5 Workload Model Suite Table 2 lists the model configurations used in the through- put study

We do not model fine-grained overlap among TP com- munication, EP communication, and compute. A.5 Workload Model Suite Table 2 lists the model configurations used in the through- put study. The MoE suite spans three orders of magnitude in total parameters, from a 0.6 T model whose experts fit within a single rack-local NVLink domain to a 401 T model that ...

work page 2025
[78]

These trajectories define the non-GPU rack power inputs used by the SKU generation procedure

Storage racks are anchored at 15 kW in 2025 and grow at {2%, 4%, 6%} annually, reaching {18, 22, 26} kW by 2034. These trajectories define the non-GPU rack power inputs used by the SKU generation procedure. Unless otherwise 17 Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, and Ricardo Bianchini Table 3.Deployment architecture param...

work page 2025

[1] [1]

AccuTech Communications. 2024. Best Data Center Build Out Cost: Top 5 Key Factors.https://accutechcom.com/data-center-build-out- cost/States Tier III construction typically $7–$9M per MW (illustrative benchmark)

work page 2024

[2] [2]

2019.Power Redundancy Schemes for Data Centers

ASCO Power Technologies. 2019.Power Redundancy Schemes for Data Centers. Technical Report PS-WP-REDUNDANCY-DATA. Uptime Institute.https://www.se.com/sg/en/download/document/PS-WP- REDUNDANCY-DATA/

work page 2019

[3] [3]

Tor- res Arango

Victor Avelar, Patrick Donovan, Wendy Torell, and Maria A. Tor- res Arango. 2025.How 6 AI Attributes Change Data Center De- sign. Technical Report White Paper 110, v3. Schneider Electric. https://www.se.com/us/en/download/document/SPD_WP110_EN/

work page 2025

[4] [4]

Hugo Barbalho, Patricia Kovaleski, Beibin Li, Luke Marshall, Marco Molinaro, Abhisek Pan, Eli Cortez, Matheus Leao, Harsh Patwari, Zuzu Tang, Larissa Rozales Gonçalves, David Dion, Thomas Mosci- broda, and Ishai Menache. 2023. Virtual Machine Allocation with Lifetime Predictions. InProceedings of Machine Learning and Systems, D. Song, M. Carbin, and T. Ch...

work page 2023

[5] [5]

Scalable Funding of Bitcoin Micropayment Channel Networks

Luiz Andr’e Barroso, Urs H"olzle, and Parthasarathy Ranganathan. 2019.The Datacenter as a Computer: Designing Warehouse-Scale Ma- chines(3 ed.). Springer, Cham. XVIII, 189 pages. doi:10.1007/978-3- 031-01761-2

work page doi:10.1007/978-3- 2019

[6] [6]

Noman Bashir, Nan Deng, Krzysztof Rzadca, David Irwin, Sree Kodak, and Rohit Jnagal. 2021. Take it to the limit: peak prediction-driven re- source overcommitment in datacenters. InProceedings of the Sixteenth European Conference on Computer Systems(Online Event, United King- dom)(EuroSys ’21). Association for Computing Machinery, New York, NY, USA, 556–57...

work page doi:10.1145/3447786.3456259 2021

[7] [7]

Saumil Baxi, Kayla Cummings, Alexandre Jacquillat, Sean Lo, Rob McDonald, Konstantina Mellou, Ishai Menache, and Marco Molinaro

work page

[8] [8]

arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725

Online Rack Placement in Large-Scale Data Centers: Online Sampling Optimization and Deployment. arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725

work page arXiv

[9] [9]

Ricardo Bianchini, Christian Belady, and Anand Sivasubramaniam

work page

[10] [10]

Datacenter power and energy management: past, present, and future.IEEE Micro(2024)

work page 2024

[11] [11]

2019.Capital Cost Analysis of Immer- sive Liquid-Cooled vs

Robert Bunger and Wendy Torell. 2019.Capital Cost Analysis of Immer- sive Liquid-Cooled vs. Air-Cooled Large Data Centres. Technical Report White Paper 282. Schneider Electric. Detailed CapEx comparison of 2MW datacenter configurations. Provides itemized infrastructure costs including generators, UPS, switchgear, and cooling subsystems

work page 2019

[12] [12]

Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, and Mosharaf Chowdhury. 2024. Reducing Energy Bloat in Large Model Training. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles(Austin, TX, USA)(SOSP ’24). As- sociation for Computing Machinery, New York, NY, USA, 144–159. doi:10.1145/3694715.3695970

work page doi:10.1145/3694715.3695970 2024

[13] [13]

Cohen, Philipp Keller, Vahab Mirrokni, and Morteza Zadi- moghadddam

Maxime C. Cohen, Philipp Keller, Vahab Mirrokni, and Morteza Zadi- moghadddam. 2017. Overcommitment in Cloud Services Bin packing with Chance Constraints. InProceedings of the 2017 ACM SIGMET- RICS / International Conference on Measurement and Modeling of Com- puter Systems(Urbana-Champaign, Illinois, USA)(SIGMETRICS ’17 Abstracts). Association for Comput...

work page 2017

[14] [14]

doi:10.1145/3078505.3078530

work page doi:10.1145/3078505.3078530

[15] [15]

Data Center Frontier. 2025. OCP Summit 2025 High- lights: Advancing Data Center Densification and Security. https://www.datacenterfrontier.com/design/article/55324586/ocp- summit-2025-highlights-advancing-data-center-densification-and- securityIndustry shift toward 800V DC power distribution for megawatt rack scales

work page arXiv 2025

[16] [16]

Datacenters.com. 2025. Next-Gen Processors: Redefining Data Center Performance in 2025.https://www.datacenters.com/news/next-gen- processors-how-they-re-redefining-data-center-performanceHigh- performance processors pushing rack densities beyond 80 kW require liquid cooling

work page 2025

[17] [17]

Dgtl Infra. 2024. How Much Does it Cost to Build a Data Cen- ter?https://dgtlinfra.com/how-much-does-it-cost-to-build-a-data- center/Component-level cost breakdowns for Tier III/IV facilities

work page 2024

[18] [18]

Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, and Junchen Jiang. 2025. PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles(Lotte Hotel World, S...

work page doi:10.1145/3731569.3764834 2025

[19] [19]

Lisa Duignan. 2024. Data centre cost index 2024.https://www. turnerandtownsend.com/insights/data-centre-cost-index-2024/ Global construction cost benchmarks for data centers

work page 2024

[20] [20]

Daniel Ellsworth, Tapasya Patki, Swann Perarnau, Sangmin Seo, Ab- delhalim Amer, Judicael Zounmevo, Rinku Gupta, Kazutomo Yoshii, Henry Hoffman, Allen Malony, Martin Schulz, and Pete Beckman

work page

[21] [21]

In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Systemwide Power Management with Argo. In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1118–1121. doi:10.1109/IPDPSW.2016.81

work page doi:10.1109/ipdpsw.2016.81 2016

[22] [22]

Marius Eriksen, Kaushik Veeraraghavan, Yusuf Abdulghani, Andrew Birchall, Po-Yen Chou, Richard Cornew, Adela Kabiljo, Ranjith Ku- mar S, Maroo Lieuw, Justin Meza, Scott Michelson, Thomas Rohloff, Hayley Russell, Jeff Qin, and Chunqiang Tang. 2023. Global Capacity 13 Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, and Ricardo Bianchi...

work page 2023

[23] [23]

Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer.SIGARCH Com- put. Archit. News35, 2 (June 2007), 13–23. doi:10.1145/1273440.1250665

work page doi:10.1145/1273440.1250665 2007

[24] [24]

Daya Guo et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638. doi:10.1038/s41586-025-09422-z

work page doi:10.1038/s41586-025-09422-z 2025

[25] [25]

Nishant Gupta, Iyswarya Narayanan, Shivam Handa, Sayak Chakraborti, Pankit Thapar, Baohua Shan, Ariel Rao, Yuanlai Liu, Pengyuan Wang, Yuqing Wu, Qingyi Gao, Chris Chao-Chun Cheng, Sihan You, Louis Huang, Jingyuan Fan, Kenny Yu, Kevin Lin, Tengfei Mu, Parth Malani, Haiying Wang, Trey Lu, and Peter Zhang. 2024. Dynamic Idle Resource Leasing To Safely Overs...

work page doi:10.1145/3698038.3698537 2024

[26] [26]

James Hamilton. 2009. Internet-scale service infrastructure efficiency. SIGARCH Comput. Archit. News37, 3 (June 2009), 232. doi:10.1145/ 1555815.1555756

work page arXiv 2009

[27] [27]

2019.Electrical Distribution Equipment in Data Center En- vironments (White Paper 61, Rev

Pearl Hu. 2019.Electrical Distribution Equipment in Data Center En- vironments (White Paper 61, Rev. 2). Technical Report. Schneider Electric.https://www.se.com/us/en/download/document/SPD_VAVR- 8W4MEX_EN/Equipment-level per-kW ranges for MV/LV switchgear, transformers, PDUs, panels

work page 2019

[28] [28]

Tullsen, and Ta- jana Simunic Rosing

Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Samp- son, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Ta- jana Simunic Rosing. 2012. Managing distributed UPS energy for effective power capping in data centers. In2012 39th Annual In- ternational Symposium on Computer Architecture (ISCA). 488–499. doi:10.1109/ISCA.2012.6237042

work page doi:10.1109/isca.2012.6237042 2012

[29] [29]

Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, et al . 2021. {Prediction-Based} power oversubscription in cloud platforms. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 473–487

work page 2021

[30] [30]

Ming-Chi Kuo. 2025. NVIDIA AI Server Power Roadmap: Kyber’s Next-Generation Strategy from GPU/Rack-Level to Data-Center Scale.https://medium.com/@mingchikuo/nvidia-ai-server-power- roadmap-kybers-next-generation-strategy-from-gpu-rack-level-to- data-center-e380b459e183Industry analysis of NVIDIA’s reference design scope extending to entire data center

work page 2025

[31] [31]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica

work page

[32] [32]

Efficient Memory Management for Large Language Model Serving with PagedAttention , booktitle =

Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23). As- sociation for Computing Machinery, New York, NY, USA, 611–626. doi:10.1145/3600006.3613165

work page doi:10.1145/3600006.3613165

[33] [33]

Lefurgy, Karthick Rajamani, Malcolm S

Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S. Allen- Ware, Guillermo J. Silva, Daniel D. Heimsoth, Saugata Ghose, and Onur Mutlu. 2019. A Scalable Priority-Aware Approach to Managing Data Center Server Power. In2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 701–714. doi:10. 1109/HPCA.2019.00067

work page arXiv 2019

[34] [34]

Rui Peng Liu, Konstantina Mellou, Evelyn Xiao-Yue Gong, Beibin Li, Thomas Coffee, Jeevan Pathuri, David Simchi-Levi, and Ishai Menache

work page

[35] [35]

Manufacturing & Service Operations Management27, 2 (2025), 425–440

Efficient Cloud Server Deployment Under Demand Uncertainty. Manufacturing & Service Operations Management27, 2 (2025), 425–440. arXiv:https://doi.org/10.1287/msom.2023.0372 doi:10.1287/msom.2023. 0372

work page doi:10.1287/msom.2023.0372 2025

[36] [36]

2016.Comparing UPS System Design Configurations

Kevin McCarthy and Victor Avelar. 2016.Comparing UPS System Design Configurations. Technical Report White Paper 75. Schneider Electric – Data Center Science Center.https://download.schneider- electric.com/files?p_Doc_Ref=SPD_SADE-5TPL8X_ENRevision 4

work page 2016

[37] [37]

John McWilliams, Ethan Tribble, Adrian Conforti, and Jason DOrlando

work page

[38] [38]

cld.bz/Data-Center-Development-Cost-Guide-2025Market-level to- tals and cost drivers across U.S

Data Center Development Cost Guide 2025.https://cushwake. cld.bz/Data-Center-Development-Cost-Guide-2025Market-level to- tals and cost drivers across U.S. regions

work page 2025

[39] [39]

Chris Mellor. 2025. Power Consumption and Datacenters. https://blocksandfiles.com/2025/07/14/power-consumption-and- data-centers/Dell’Oro analysis: AI workloads require 60-120 kW/rack for accelerated servers

work page 2025

[40] [40]

Konstantina Mellou, Marco Molinaro, and Rudy Zhou. 2024. The Power of Migrations in Dynamic Bin Packing.Proc. ACM Meas. Anal. Comput. Syst.8, 3, Article 45 (Dec. 2024), 28 pages. doi:10.1145/3700435

work page doi:10.1145/3700435 2024

[41] [41]

Timothy Prickett Morgan. 2025. Nvidia Draws GPU System Roadmap Out To 2028.https://www.nextplatform.com/2025/03/19/ nvidia-draws-gpu-system-roadmap-out-to-2028/Rubin Ultra VR300 NVL576 consuming over 600 kilowatts, 21×performance of GB200

work page 2025

[42] [42]

Christopher Muir, Luke Marshall, and Alejandro To- riello. 2024. Temporal Bin Packing with Half-Capacity Jobs.INFORMS Journal on Optimization6, 1 (2024), 46–62. arXiv:https://doi.org/10.1287/ijoo.2023.0002 doi:10.1287/ijoo.2023.0002

work page doi:10.1287/ijoo.2023.0002 2024

[43] [43]

NVIDIA Corporation. 2025. Building the 800 VDC Ecosystem for Efficient, Scalable AI Factories.https://developer.nvidia.com/blog/ building-the-800-vdc-ecosystem-for-efficient-scalable-ai-factories Technical blog detailing 800V DC power distribution for megawatt rack scales

work page 2025

[44] [44]

2023.OAI System Liquid Cool- ing Guidelines

Open Compute Project. 2023.OAI System Liquid Cool- ing Guidelines. White Paper. Open Compute Project. https://www.opencompute.org/documents/oai-system-liquid- cooling-guidelines-in-ocp-template-mar-3-2023-update-pdf

work page 2023

[45] [45]

Dylan Patel, Daniel Nishball, Kimbo Chen, Wega Chu, Ivan Chiam, and Cheang Kang Wen. 2025. Another Giant Leap: The Rubin CPX Specialized Accelerator & Rack.https://newsletter.semianalysis.com/ p/another-giant-leap-the-rubin-cpx-specialized-accelerator-rack

work page 2025

[46] [46]

Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. 2024. Charac- terizing Power Management Opportunities for LLMs in the Cloud. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (La Jolla, CA, USA)(ASP...

work page doi:10.1145/3620666.3651329 2024

[47] [47]

Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, and Ricardo Bianchini. 2024. Splitwise: Efficient Generative LLM Inference Using Phase Splitting. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 118–132. doi:10.1109/ISCA59077.2024.00019

work page doi:10.1109/isca59077.2024.00019 2024

[48] [48]

Leonardo Piga, Iyswarya Narayanan, Aditya Sundarrajan, Matt Skach, Qingyuan Deng, Biswadip Maity, Manoj Chakkaravarthy, Alison Huang, Abhishek Dhanotia, and Parth Malani. 2024. Expanding data- center capacity with dvfs boosting: A safe and scalable deployment experience. InProceedings of the 29th ACM International Conference on Architectural Support for P...

work page 2024

[49] [49]

Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No "power" struggles: co- ordinated multi-level power management for the data center. InPro- ceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems(Seattle, WA, USA) (ASPLOS XIII). Association for...

work page doi:10.1145/1346281.1346289 2008

[50] [50]

Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yux- iong He. 2022. DeepSpeed-MoE: Advancing Mixture-of-Experts In- ference and Training to Power Next-Generation AI Scale.https: //proceedings.mlr.press/v162/rajbhandari22a.html

work page 2022

[51] [51]

Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan

work page

[52] [52]

Data center power oversubscription with a medium voltage power plane and priority-aware capping,

Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping. InProceedings of the Twenty- Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, USA, 497–511.https: //dl.acm.org/doi/abs/10.1145/3373376.3378533

work page doi:10.1145/3373376.3378533

[53] [53]

Max Smolaks. 2023. Data center costs set to rise and rise.https://journal.uptimeinstitute.com/data-center-costs-set-to- rise-and-rise/Analysis of supply chain impacts on infrastructure costs

work page 2023

[54] [54]

Jovan Stojkovic, Chaojie Zhang, Inigo Goiri, and Ricardo Bianchini

work page

[55] [55]

arXiv:2509.26534 [cs.AI]https://arxiv.org/abs/2509.26534

Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework. arXiv:2509.26534 [cs.AI]https://arxiv.org/abs/2509.26534

work page arXiv

[56] [56]

Jovan Stojkovic, Chaojie Zhang, Inigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. 2025. TAPAS: Thermal- and Power-A ware Scheduling for LLM Inference in Cloud Platforms. Association for Computing Machinery, New York, NY, USA, 1266–1281.https://doi.org/10.1145/3676641.3716025

work page doi:10.1145/3676641.3716025 2025

[57] [57]

Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, and Esha Choukse. 2025. DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). 1348–

work page 2025

[58] [58]

doi:10.1109/HPCA61900.2025.00102

work page doi:10.1109/hpca61900.2025.00102 2025

[59] [59]

The Register. 2023. Intel and AMD Just Created a Headache for Legacy Datacenters.https://www.theregister.com/2023/01/19/intel_ amd_uptime_cooling/AMD Epyc 4 at 400W and Intel Xeon Scalable at 350W TDP

work page 2023

[60] [60]

Thunder Said Energy. 2024. Economic costs of data-centers?https: //thundersaidenergy.com/downloads/data-centers-the-economics/ Cost breakdown analysis including mechanical systems

work page 2024

[61] [61]

Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, and Fabrizio Petrini. 2025. Scaling Intelligence: Designing Data Centers for Next- Gen Language Models. arXiv:2506.15006 [cs.AR]https://arxiv.org/ abs/2506.15006

work page arXiv 2025

[62] [62]

Tom’s Hardware. 2025. Nvidia Announces Reference De- sign for Colossal Gigawatt-scale Omniverse DSX Data Cen- ters.https://www.tomshardware.com/tech-industry/artificial- intelligence/nvidia-announces-reference-design-for-gargantuan- gigawatt-scale-omniverse-dsx-data-centers-single-data-center- requires-a-nuclear-reactors-worth-of-power-generationNVIDIA’s ...

work page 2025

[63] [63]

2016.Cost, Speed, and Reliability Tradeoffs be- tween N+1 UPS Configurations

Wendy Torell. 2016.Cost, Speed, and Reliability Tradeoffs be- tween N+1 UPS Configurations. Technical Report White Paper

work page 2016

[64] [64]

Schneider Electric – Data Center Science Center.https: //www.apc.com/us/en/support/resources-tools/white-papers/cost- speed-and-reliability-tradeoffs-between-n1-ups-configurations.jsp Revision 2

work page

[65] [65]

W Pitt Turner IV, JH PE, PE Seader, and KJ Brill. 2006. Tier classification define site infrastructure performance.Uptime Institute17 (2006)

work page 2006

[66] [66]

Jarred Walton. 2025. Nvidia Announces Rubin GPUs in 2026, Rubin Ultra in 2027, Feynman Also Added to Roadmap. https://www.tomshardware.com/pc-components/gpus/nvidia- announces-rubin-gpus-in-2026-rubin-ultra-in-2027-feynam-after Rubin NVL144 specifications: 3.6 EFLOPS FP4, 288GB HBM4, 13 TB/s bandwidth

work page 2025

[67] [67]

Jarred Walton. 2025. Nvidia Shows Off Rubin Ultra with 600,000-Watt Kyber Racks and Infrastructure, Coming in 2027. https://www.tomshardware.com/pc-components/gpus/nvidia- shows-off-rubin-ultra-with-600-000-watt-kyber-racks-and- infrastructure-coming-in-2027Kyber rack architecture targeting 600kW per rack with Rubin Ultra GPUs

work page 2025

[68] [68]

Di Wang, Chuangang Ren, Anand Sivasubramaniam, Bhuvan Ur- gaonkar, and Hosam Fathy. 2012. Energy storage in datacenters: what, where, and how much?SIGMETRICS Perform. Eval. Rev.40, 1 (June 2012), 187–198. doi:10.1145/2318857.2254780

work page doi:10.1145/2318857.2254780 2012

[69] [69]

Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. 2016. Dy- namo: facebook’s data center-wide power management system. In Proceedings of the 43rd International Symposium on Computer Archi- tecture(Seoul, Republic of Korea)(ISCA ’16). IEEE Press, 469–480. doi:10.1109/ISCA.2016.48

work page doi:10.1109/isca.2016.48 2016

[70] [70]

Misra, Rod Assis, Kyle Woolcock, Nithish Ma- halingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, and Ricardo Bianchini

Chaojie Zhang, Alok Gautam Kumbhare, Ioannis Manousakis, Deli Zhang, Pulkit A. Misra, Rod Assis, Kyle Woolcock, Nithish Ma- halingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, and Ricardo Bianchini

work page

[71] [71]

Cosa: Scheduling by constrained optimization for spatial accelerators,

Flex: High-Availability Datacenters With Zero Reserved Power. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 319–332. doi:10.1109/ISCA52012.2021.00033

work page doi:10.1109/isca52012.2021.00033 2021

[72] [72]

Hengrui Zhang, Pratyush Patel, August Ning, and David Wentzlaff

work page

[73] [73]

arXiv:2510.08544 [cs.AR]https://arxiv.org/abs/ 2510.08544

SPAD: Specialized Prefill and Decode Hardware for Disaggre- gated LLM Inference. arXiv:2510.08544 [cs.AR]https://arxiv.org/abs/ 2510.08544

work page arXiv

[74] [74]

Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xu- anzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: disaggregating prefill and decoding for goodput-optimized large language model serv- ing. InProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation(Santa Clara, CA, USA)(OSDI’24). USENIX Association, USA, Art...

work page 2024

[75] [75]

It is a first-order comparative model, not a topology- accurate runtime simulator

work page

[76] [76]

Communication is modeled with bandwidth-time ap- proximations rather than collective-specific kernels

work page

[77] [77]

A.5 Workload Model Suite Table 2 lists the model configurations used in the through- put study

We do not model fine-grained overlap among TP com- munication, EP communication, and compute. A.5 Workload Model Suite Table 2 lists the model configurations used in the through- put study. The MoE suite spans three orders of magnitude in total parameters, from a 0.6 T model whose experts fit within a single rack-local NVLink domain to a 401 T model that ...

work page 2025

[78] [78]

These trajectories define the non-GPU rack power inputs used by the SKU generation procedure

Storage racks are anchored at 15 kW in 2025 and grow at {2%, 4%, 6%} annually, reaching {18, 22, 26} kW by 2034. These trajectories define the non-GPU rack power inputs used by the SKU generation procedure. Unless otherwise 17 Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, and Ricardo Bianchini Table 3.Deployment architecture param...

work page 2025