Designing Datacenter Power Delivery Hierarchies for the AI Era
Pith reviewed 2026-05-19 18:19 UTC · model grok-4.3
The pith
AI datacenters must prioritize deployable capacity over time instead of installed megawatts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that for AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time. The framework combines projection models for GPU, compute, and storage with operational factors to evaluate designs over realistic sequences, showing that multi-resource stranding changes deployable capacity, effective capital expenditure, and delivered performance.
What carries the argument
A framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences.
Load-bearing premise
The projection models for deployments and the operational factors from production data accurately represent the joint effects of topology, granularity, policy, oversubscription, and workload over changing sequences.
What would settle it
Measuring the actual deployable capacity and utilization in an AI datacenter over multiple years and comparing it to the framework's predictions would test the central claim.
Figures
read the original abstract
Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. This poses a major challenge for datacenter power delivery designers. As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix. Moreover, each of these factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis. To address this challenge, we develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences. The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure. Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance, and quantify how rising density from rack- and pod-scale AI systems shapes these outcomes. For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a framework for evaluating datacenter power delivery hierarchies under rising AI accelerator densities (projected to approach 1 MW per rack by 2027). It integrates projection models for GPU/compute/storage deployments with operational factors from Microsoft Azure production data to compute throughput, power, and cost metrics across realistic arrival, oversubscription, and decommissioning sequences. The central claim is that multi-resource stranding materially alters deployable capacity, effective capex, and delivered performance, so the relevant planning objective is deployable capacity over time rather than installed megawatts.
Significance. The result, if it holds, reframes a practical design problem for long-lived datacenters facing scarce grid capacity. Grounding the operational factors in production traces and handling joint evolution of topology, granularity, policy, and workload mix are genuine strengths that could influence how operators and architects prioritize power hierarchies. The absence of disclosed quantitative outputs, sensitivity results, or external validation in the provided description, however, limits the immediate weight of the conclusions.
major comments (2)
- The central claim that multi-resource stranding materially changes deployable capacity, capex, and performance rests on the projection models correctly encoding inter-dependencies among electrical topology, deployment granularity, placement policy, oversubscription, and workload mix across evolving sequences. The manuscript provides no sensitivity analysis to alternative arrival/decommissioning sequences or workload-mix assumptions, nor any comparison against external traces, leaving open the possibility that reported stranding effects are artifacts of the chosen Microsoft-Azure-derived sequences rather than a general property of power hierarchies.
- No validation details, error analysis, or robustness checks against variations in oversubscription factors and placement policies are supplied. Because these parameters directly affect the joint resource stranding calculations that drive the shift from installed MW to deployable capacity, their omission is load-bearing for the quantitative results.
minor comments (1)
- The abstract states high-level outcomes without any numerical values, confidence intervals, or table references; adding at least one concrete example (e.g., percentage change in deployable capacity for a 2027 density scenario) would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for acknowledging the strengths of grounding the framework in production traces and jointly modeling evolving factors. We agree that additional sensitivity and validation details will strengthen the quantitative claims. We respond to each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: The central claim that multi-resource stranding materially changes deployable capacity, capex, and performance rests on the projection models correctly encoding inter-dependencies among electrical topology, deployment granularity, placement policy, oversubscription, and workload mix across evolving sequences. The manuscript provides no sensitivity analysis to alternative arrival/decommissioning sequences or workload-mix assumptions, nor any comparison against external traces, leaving open the possibility that reported stranding effects are artifacts of the chosen Microsoft-Azure-derived sequences rather than a general property of power hierarchies.
Authors: We acknowledge that the submitted manuscript does not present explicit sensitivity analyses to alternative sequences or workload-mix assumptions, nor direct comparisons to external traces. Our sequences were selected to reflect realistic inter-dependencies observed in Microsoft Azure production data. To address this, we will add a dedicated sensitivity analysis subsection in the revised manuscript. This will include variations in arrival/decommissioning rates (e.g., faster/slower AI accelerator ramps) and workload mixes (e.g., increased storage-to-GPU ratios drawn from public industry reports). We will also add a discussion of generalizability, referencing publicly available datacenter utilization statistics from other operators to support that the stranding effects are characteristic of high-density AI deployments rather than artifacts of our specific traces. revision: yes
-
Referee: No validation details, error analysis, or robustness checks against variations in oversubscription factors and placement policies are supplied. Because these parameters directly affect the joint resource stranding calculations that drive the shift from installed MW to deployable capacity, their omission is load-bearing for the quantitative results.
Authors: We agree that the current manuscript lacks detailed validation, error analysis, and robustness checks on oversubscription and placement policies, which are central to the stranding calculations. The operational parameters are derived from Microsoft Azure traces, but these aspects were not expanded upon. In the revision we will augment the methods and evaluation sections with: (1) validation metrics comparing framework outputs to the source production data, including quantitative error bounds where the data permit; (2) robustness sweeps over oversubscription factors (1.1x–2.0x) and placement policies (e.g., random, power-aware, and affinity-based); and (3) explicit quantification of how these variations affect multi-resource stranding and the deployable-capacity metric. These additions will directly support the load-bearing quantitative results. revision: yes
Circularity Check
No circularity: results from external-data simulation framework
full rationale
The paper presents a simulation framework that evaluates power delivery hierarchies by combining projection models for GPU/compute/storage deployments with operational factors from Microsoft Azure production data. It reports outcomes on stranding, capacity, capex and performance across arrival/oversubscription/decommissioning sequences. No equations, derivations or self-citations are exhibited that reduce any claimed result to fitted inputs or prior author work by construction. The central claim (deployable capacity over time, not installed MW) follows from the framework outputs rather than being tautological with its inputs. This is a standard empirical systems study whose load-bearing elements rest on external traces and models.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Production data from Microsoft Azure is representative of general datacenter operations and inter-dependencies.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance... the relevant planning objective is not installed megawatts, but deployable capacity over time.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
AccuTech Communications. 2024. Best Data Center Build Out Cost: Top 5 Key Factors.https://accutechcom.com/data-center-build-out- cost/States Tier III construction typically $7–$9M per MW (illustrative benchmark)
work page 2024
-
[2]
2019.Power Redundancy Schemes for Data Centers
ASCO Power Technologies. 2019.Power Redundancy Schemes for Data Centers. Technical Report PS-WP-REDUNDANCY-DATA. Uptime Institute.https://www.se.com/sg/en/download/document/PS-WP- REDUNDANCY-DATA/
work page 2019
-
[3]
Victor Avelar, Patrick Donovan, Wendy Torell, and Maria A. Tor- res Arango. 2025.How 6 AI Attributes Change Data Center De- sign. Technical Report White Paper 110, v3. Schneider Electric. https://www.se.com/us/en/download/document/SPD_WP110_EN/
work page 2025
-
[4]
Hugo Barbalho, Patricia Kovaleski, Beibin Li, Luke Marshall, Marco Molinaro, Abhisek Pan, Eli Cortez, Matheus Leao, Harsh Patwari, Zuzu Tang, Larissa Rozales Gonçalves, David Dion, Thomas Mosci- broda, and Ishai Menache. 2023. Virtual Machine Allocation with Lifetime Predictions. InProceedings of Machine Learning and Systems, D. Song, M. Carbin, and T. Ch...
work page 2023
-
[5]
Scalable Funding of Bitcoin Micropayment Channel Networks
Luiz Andr’e Barroso, Urs H"olzle, and Parthasarathy Ranganathan. 2019.The Datacenter as a Computer: Designing Warehouse-Scale Ma- chines(3 ed.). Springer, Cham. XVIII, 189 pages. doi:10.1007/978-3- 031-01761-2
-
[6]
Noman Bashir, Nan Deng, Krzysztof Rzadca, David Irwin, Sree Kodak, and Rohit Jnagal. 2021. Take it to the limit: peak prediction-driven re- source overcommitment in datacenters. InProceedings of the Sixteenth European Conference on Computer Systems(Online Event, United King- dom)(EuroSys ’21). Association for Computing Machinery, New York, NY, USA, 556–57...
-
[7]
Saumil Baxi, Kayla Cummings, Alexandre Jacquillat, Sean Lo, Rob McDonald, Konstantina Mellou, Ishai Menache, and Marco Molinaro
-
[8]
arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725
Online Rack Placement in Large-Scale Data Centers: Online Sampling Optimization and Deployment. arXiv:2501.12725 [math.OC] https://arxiv.org/abs/2501.12725
-
[9]
Ricardo Bianchini, Christian Belady, and Anand Sivasubramaniam
-
[10]
Datacenter power and energy management: past, present, and future.IEEE Micro(2024)
work page 2024
-
[11]
2019.Capital Cost Analysis of Immer- sive Liquid-Cooled vs
Robert Bunger and Wendy Torell. 2019.Capital Cost Analysis of Immer- sive Liquid-Cooled vs. Air-Cooled Large Data Centres. Technical Report White Paper 282. Schneider Electric. Detailed CapEx comparison of 2MW datacenter configurations. Provides itemized infrastructure costs including generators, UPS, switchgear, and cooling subsystems
work page 2019
-
[12]
Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, and Mosharaf Chowdhury. 2024. Reducing Energy Bloat in Large Model Training. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles(Austin, TX, USA)(SOSP ’24). As- sociation for Computing Machinery, New York, NY, USA, 144–159. doi:10.1145/3694715.3695970
-
[13]
Cohen, Philipp Keller, Vahab Mirrokni, and Morteza Zadi- moghadddam
Maxime C. Cohen, Philipp Keller, Vahab Mirrokni, and Morteza Zadi- moghadddam. 2017. Overcommitment in Cloud Services Bin packing with Chance Constraints. InProceedings of the 2017 ACM SIGMET- RICS / International Conference on Measurement and Modeling of Com- puter Systems(Urbana-Champaign, Illinois, USA)(SIGMETRICS ’17 Abstracts). Association for Comput...
work page 2017
-
[14]
doi:10.1145/3078505.3078530
-
[15]
Data Center Frontier. 2025. OCP Summit 2025 High- lights: Advancing Data Center Densification and Security. https://www.datacenterfrontier.com/design/article/55324586/ocp- summit-2025-highlights-advancing-data-center-densification-and- securityIndustry shift toward 800V DC power distribution for megawatt rack scales
-
[16]
Datacenters.com. 2025. Next-Gen Processors: Redefining Data Center Performance in 2025.https://www.datacenters.com/news/next-gen- processors-how-they-re-redefining-data-center-performanceHigh- performance processors pushing rack densities beyond 80 kW require liquid cooling
work page 2025
-
[17]
Dgtl Infra. 2024. How Much Does it Cost to Build a Data Cen- ter?https://dgtlinfra.com/how-much-does-it-cost-to-build-a-data- center/Component-level cost breakdowns for Tier III/IV facilities
work page 2024
-
[18]
Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, and Junchen Jiang. 2025. PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles(Lotte Hotel World, S...
-
[19]
Lisa Duignan. 2024. Data centre cost index 2024.https://www. turnerandtownsend.com/insights/data-centre-cost-index-2024/ Global construction cost benchmarks for data centers
work page 2024
-
[20]
Daniel Ellsworth, Tapasya Patki, Swann Perarnau, Sangmin Seo, Ab- delhalim Amer, Judicael Zounmevo, Rinku Gupta, Kazutomo Yoshii, Henry Hoffman, Allen Malony, Martin Schulz, and Pete Beckman
-
[21]
In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Systemwide Power Management with Argo. In2016 IEEE In- ternational Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1118–1121. doi:10.1109/IPDPSW.2016.81
-
[22]
Marius Eriksen, Kaushik Veeraraghavan, Yusuf Abdulghani, Andrew Birchall, Po-Yen Chou, Richard Cornew, Adela Kabiljo, Ranjith Ku- mar S, Maroo Lieuw, Justin Meza, Scott Michelson, Thomas Rohloff, Hayley Russell, Jeff Qin, and Chunqiang Tang. 2023. Global Capacity 13 Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, and Ricardo Bianchi...
work page 2023
-
[23]
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer.SIGARCH Com- put. Archit. News35, 2 (June 2007), 13–23. doi:10.1145/1273440.1250665
-
[24]
Daya Guo et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638. doi:10.1038/s41586-025-09422-z
-
[25]
Nishant Gupta, Iyswarya Narayanan, Shivam Handa, Sayak Chakraborti, Pankit Thapar, Baohua Shan, Ariel Rao, Yuanlai Liu, Pengyuan Wang, Yuqing Wu, Qingyi Gao, Chris Chao-Chun Cheng, Sihan You, Louis Huang, Jingyuan Fan, Kenny Yu, Kevin Lin, Tengfei Mu, Parth Malani, Haiying Wang, Trey Lu, and Peter Zhang. 2024. Dynamic Idle Resource Leasing To Safely Overs...
- [26]
-
[27]
2019.Electrical Distribution Equipment in Data Center En- vironments (White Paper 61, Rev
Pearl Hu. 2019.Electrical Distribution Equipment in Data Center En- vironments (White Paper 61, Rev. 2). Technical Report. Schneider Electric.https://www.se.com/us/en/download/document/SPD_VAVR- 8W4MEX_EN/Equipment-level per-kW ranges for MV/LV switchgear, transformers, PDUs, panels
work page 2019
-
[28]
Tullsen, and Ta- jana Simunic Rosing
Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Samp- son, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Ta- jana Simunic Rosing. 2012. Managing distributed UPS energy for effective power capping in data centers. In2012 39th Annual In- ternational Symposium on Computer Architecture (ISCA). 488–499. doi:10.1109/ISCA.2012.6237042
-
[29]
Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A Misra, Seyyed Ah- mad Javadi, Bianca Schroeder, Marcus Fontoura, et al . 2021. {Prediction-Based} power oversubscription in cloud platforms. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 473–487
work page 2021
-
[30]
Ming-Chi Kuo. 2025. NVIDIA AI Server Power Roadmap: Kyber’s Next-Generation Strategy from GPU/Rack-Level to Data-Center Scale.https://medium.com/@mingchikuo/nvidia-ai-server-power- roadmap-kybers-next-generation-strategy-from-gpu-rack-level-to- data-center-e380b459e183Industry analysis of NVIDIA’s reference design scope extending to entire data center
work page 2025
-
[31]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica
-
[32]
Efficient Memory Management for Large Language Model Serving with PagedAttention , booktitle =
Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23). As- sociation for Computing Machinery, New York, NY, USA, 611–626. doi:10.1145/3600006.3613165
-
[33]
Lefurgy, Karthick Rajamani, Malcolm S
Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S. Allen- Ware, Guillermo J. Silva, Daniel D. Heimsoth, Saugata Ghose, and Onur Mutlu. 2019. A Scalable Priority-Aware Approach to Managing Data Center Server Power. In2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 701–714. doi:10. 1109/HPCA.2019.00067
-
[34]
Rui Peng Liu, Konstantina Mellou, Evelyn Xiao-Yue Gong, Beibin Li, Thomas Coffee, Jeevan Pathuri, David Simchi-Levi, and Ishai Menache
-
[35]
Manufacturing & Service Operations Management27, 2 (2025), 425–440
Efficient Cloud Server Deployment Under Demand Uncertainty. Manufacturing & Service Operations Management27, 2 (2025), 425–440. arXiv:https://doi.org/10.1287/msom.2023.0372 doi:10.1287/msom.2023. 0372
-
[36]
2016.Comparing UPS System Design Configurations
Kevin McCarthy and Victor Avelar. 2016.Comparing UPS System Design Configurations. Technical Report White Paper 75. Schneider Electric – Data Center Science Center.https://download.schneider- electric.com/files?p_Doc_Ref=SPD_SADE-5TPL8X_ENRevision 4
work page 2016
-
[37]
John McWilliams, Ethan Tribble, Adrian Conforti, and Jason DOrlando
-
[38]
cld.bz/Data-Center-Development-Cost-Guide-2025Market-level to- tals and cost drivers across U.S
Data Center Development Cost Guide 2025.https://cushwake. cld.bz/Data-Center-Development-Cost-Guide-2025Market-level to- tals and cost drivers across U.S. regions
work page 2025
-
[39]
Chris Mellor. 2025. Power Consumption and Datacenters. https://blocksandfiles.com/2025/07/14/power-consumption-and- data-centers/Dell’Oro analysis: AI workloads require 60-120 kW/rack for accelerated servers
work page 2025
-
[40]
Konstantina Mellou, Marco Molinaro, and Rudy Zhou. 2024. The Power of Migrations in Dynamic Bin Packing.Proc. ACM Meas. Anal. Comput. Syst.8, 3, Article 45 (Dec. 2024), 28 pages. doi:10.1145/3700435
-
[41]
Timothy Prickett Morgan. 2025. Nvidia Draws GPU System Roadmap Out To 2028.https://www.nextplatform.com/2025/03/19/ nvidia-draws-gpu-system-roadmap-out-to-2028/Rubin Ultra VR300 NVL576 consuming over 600 kilowatts, 21×performance of GB200
work page 2025
-
[42]
Christopher Muir, Luke Marshall, and Alejandro To- riello. 2024. Temporal Bin Packing with Half-Capacity Jobs.INFORMS Journal on Optimization6, 1 (2024), 46–62. arXiv:https://doi.org/10.1287/ijoo.2023.0002 doi:10.1287/ijoo.2023.0002
-
[43]
NVIDIA Corporation. 2025. Building the 800 VDC Ecosystem for Efficient, Scalable AI Factories.https://developer.nvidia.com/blog/ building-the-800-vdc-ecosystem-for-efficient-scalable-ai-factories Technical blog detailing 800V DC power distribution for megawatt rack scales
work page 2025
-
[44]
2023.OAI System Liquid Cool- ing Guidelines
Open Compute Project. 2023.OAI System Liquid Cool- ing Guidelines. White Paper. Open Compute Project. https://www.opencompute.org/documents/oai-system-liquid- cooling-guidelines-in-ocp-template-mar-3-2023-update-pdf
work page 2023
-
[45]
Dylan Patel, Daniel Nishball, Kimbo Chen, Wega Chu, Ivan Chiam, and Cheang Kang Wen. 2025. Another Giant Leap: The Rubin CPX Specialized Accelerator & Rack.https://newsletter.semianalysis.com/ p/another-giant-leap-the-rubin-cpx-specialized-accelerator-rack
work page 2025
-
[46]
Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. 2024. Charac- terizing Power Management Opportunities for LLMs in the Cloud. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (La Jolla, CA, USA)(ASP...
-
[47]
Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, and Ricardo Bianchini. 2024. Splitwise: Efficient Generative LLM Inference Using Phase Splitting. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 118–132. doi:10.1109/ISCA59077.2024.00019
-
[48]
Leonardo Piga, Iyswarya Narayanan, Aditya Sundarrajan, Matt Skach, Qingyuan Deng, Biswadip Maity, Manoj Chakkaravarthy, Alison Huang, Abhishek Dhanotia, and Parth Malani. 2024. Expanding data- center capacity with dvfs boosting: A safe and scalable deployment experience. InProceedings of the 29th ACM International Conference on Architectural Support for P...
work page 2024
-
[49]
Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No "power" struggles: co- ordinated multi-level power management for the data center. InPro- ceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems(Seattle, WA, USA) (ASPLOS XIII). Association for...
-
[50]
Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yux- iong He. 2022. DeepSpeed-MoE: Advancing Mixture-of-Experts In- ference and Training to Power Next-Generation AI Scale.https: //proceedings.mlr.press/v162/rajbhandari22a.html
work page 2022
-
[51]
Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan
-
[52]
Data center power oversubscription with a medium voltage power plane and priority-aware capping,
Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping. InProceedings of the Twenty- Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, USA, 497–511.https: //dl.acm.org/doi/abs/10.1145/3373376.3378533
-
[53]
Max Smolaks. 2023. Data center costs set to rise and rise.https://journal.uptimeinstitute.com/data-center-costs-set-to- rise-and-rise/Analysis of supply chain impacts on infrastructure costs
work page 2023
-
[54]
Jovan Stojkovic, Chaojie Zhang, Inigo Goiri, and Ricardo Bianchini
-
[55]
arXiv:2509.26534 [cs.AI]https://arxiv.org/abs/2509.26534
Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework. arXiv:2509.26534 [cs.AI]https://arxiv.org/abs/2509.26534
-
[56]
Jovan Stojkovic, Chaojie Zhang, Inigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. 2025. TAPAS: Thermal- and Power-A ware Scheduling for LLM Inference in Cloud Platforms. Association for Computing Machinery, New York, NY, USA, 1266–1281.https://doi.org/10.1145/3676641.3716025
-
[57]
Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, and Esha Choukse. 2025. DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). 1348–
work page 2025
-
[58]
doi:10.1109/HPCA61900.2025.00102
-
[59]
The Register. 2023. Intel and AMD Just Created a Headache for Legacy Datacenters.https://www.theregister.com/2023/01/19/intel_ amd_uptime_cooling/AMD Epyc 4 at 400W and Intel Xeon Scalable at 350W TDP
work page 2023
-
[60]
Thunder Said Energy. 2024. Economic costs of data-centers?https: //thundersaidenergy.com/downloads/data-centers-the-economics/ Cost breakdown analysis including mechanical systems
work page 2024
- [61]
-
[62]
Tom’s Hardware. 2025. Nvidia Announces Reference De- sign for Colossal Gigawatt-scale Omniverse DSX Data Cen- ters.https://www.tomshardware.com/tech-industry/artificial- intelligence/nvidia-announces-reference-design-for-gargantuan- gigawatt-scale-omniverse-dsx-data-centers-single-data-center- requires-a-nuclear-reactors-worth-of-power-generationNVIDIA’s ...
work page 2025
-
[63]
2016.Cost, Speed, and Reliability Tradeoffs be- tween N+1 UPS Configurations
Wendy Torell. 2016.Cost, Speed, and Reliability Tradeoffs be- tween N+1 UPS Configurations. Technical Report White Paper
work page 2016
-
[64]
Schneider Electric – Data Center Science Center.https: //www.apc.com/us/en/support/resources-tools/white-papers/cost- speed-and-reliability-tradeoffs-between-n1-ups-configurations.jsp Revision 2
-
[65]
W Pitt Turner IV, JH PE, PE Seader, and KJ Brill. 2006. Tier classification define site infrastructure performance.Uptime Institute17 (2006)
work page 2006
-
[66]
Jarred Walton. 2025. Nvidia Announces Rubin GPUs in 2026, Rubin Ultra in 2027, Feynman Also Added to Roadmap. https://www.tomshardware.com/pc-components/gpus/nvidia- announces-rubin-gpus-in-2026-rubin-ultra-in-2027-feynam-after Rubin NVL144 specifications: 3.6 EFLOPS FP4, 288GB HBM4, 13 TB/s bandwidth
work page 2025
-
[67]
Jarred Walton. 2025. Nvidia Shows Off Rubin Ultra with 600,000-Watt Kyber Racks and Infrastructure, Coming in 2027. https://www.tomshardware.com/pc-components/gpus/nvidia- shows-off-rubin-ultra-with-600-000-watt-kyber-racks-and- infrastructure-coming-in-2027Kyber rack architecture targeting 600kW per rack with Rubin Ultra GPUs
work page 2025
-
[68]
Di Wang, Chuangang Ren, Anand Sivasubramaniam, Bhuvan Ur- gaonkar, and Hosam Fathy. 2012. Energy storage in datacenters: what, where, and how much?SIGMETRICS Perform. Eval. Rev.40, 1 (June 2012), 187–198. doi:10.1145/2318857.2254780
-
[69]
Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. 2016. Dy- namo: facebook’s data center-wide power management system. In Proceedings of the 43rd International Symposium on Computer Archi- tecture(Seoul, Republic of Korea)(ISCA ’16). IEEE Press, 469–480. doi:10.1109/ISCA.2016.48
-
[70]
Chaojie Zhang, Alok Gautam Kumbhare, Ioannis Manousakis, Deli Zhang, Pulkit A. Misra, Rod Assis, Kyle Woolcock, Nithish Ma- halingam, Brijesh Warrier, David Gauthier, Lalu Kunnath, Steve Solomon, Osvaldo Morales, Marcus Fontoura, and Ricardo Bianchini
-
[71]
Cosa: Scheduling by constrained optimization for spatial accelerators,
Flex: High-Availability Datacenters With Zero Reserved Power. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 319–332. doi:10.1109/ISCA52012.2021.00033
-
[72]
Hengrui Zhang, Pratyush Patel, August Ning, and David Wentzlaff
-
[73]
arXiv:2510.08544 [cs.AR]https://arxiv.org/abs/ 2510.08544
SPAD: Specialized Prefill and Decode Hardware for Disaggre- gated LLM Inference. arXiv:2510.08544 [cs.AR]https://arxiv.org/abs/ 2510.08544
-
[74]
Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xu- anzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: disaggregating prefill and decoding for goodput-optimized large language model serv- ing. InProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation(Santa Clara, CA, USA)(OSDI’24). USENIX Association, USA, Art...
work page 2024
-
[75]
It is a first-order comparative model, not a topology- accurate runtime simulator
-
[76]
Communication is modeled with bandwidth-time ap- proximations rather than collective-specific kernels
-
[77]
A.5 Workload Model Suite Table 2 lists the model configurations used in the through- put study
We do not model fine-grained overlap among TP com- munication, EP communication, and compute. A.5 Workload Model Suite Table 2 lists the model configurations used in the through- put study. The MoE suite spans three orders of magnitude in total parameters, from a 0.6 T model whose experts fit within a single rack-local NVLink domain to a 401 T model that ...
work page 2025
-
[78]
These trajectories define the non-GPU rack power inputs used by the SKU generation procedure
Storage racks are anchored at 15 kW in 2025 and grow at {2%, 4%, 6%} annually, reaching {18, 22, 26} kW by 2034. These trajectories define the non-GPU rack power inputs used by the SKU generation procedure. Unless otherwise 17 Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, and Ricardo Bianchini Table 3.Deployment architecture param...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.