nvPAX: Constrained Optimization for Dynamic Power Allocation in Hierarchical and Multi-Tenant Systems
Pith reviewed 2026-05-09 16:13 UTC · model grok-4.3
The pith
nvPAX allocates power in hierarchical multi-tenant datacenters using three-phase constrained optimization to reach 98.92% satisfaction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
nvPAX is a constrained-optimization policy that computes feasible power allocations at every control step via a three-phase hybrid QP/LP procedure. Phase I allocates power with minimum deviation from each device's power request, while respecting job priorities. Phase II fairly distributes excess power among active devices. Phase III fairly distributes any remaining power to idle devices. The rationale is to allow power oversubscription while maximizing datacenter utilization. On a trace-driven large-scale simulation using GPU power telemetry from a production datacenter, nvPAX runs with a mean wall-clock time of 264.69 ms per allocation interval and achieves a mean satisfaction ratio of 98.9
What carries the argument
The three-phase hybrid QP/LP procedure for computing allocations that respects hierarchies, priorities, and tenant constraints.
If this is right
- Dynamic power allocation becomes practical in real-time for hierarchical systems without violating tenant contracts frequently.
- Datacenter operators can oversubscribe power more aggressively while maintaining high utilization and satisfaction.
- The method provides better robustness than static or simple greedy approaches when power bottlenecks vary across the hierarchy.
- Allocation intervals can be handled at sub-second speeds suitable for ongoing operation.
- Multi-tenant environments gain a tool to balance contractual obligations with overall efficiency.
Where Pith is reading between the lines
- Similar phased optimization could apply to allocating other constrained resources like network bandwidth or storage in multi-tenant clouds.
- Live deployment might reveal opportunities to integrate with predictive models for demand forecasting.
- Extending the phases to include energy efficiency metrics could further reduce operational costs.
- Validation across different datacenter scales or hardware types would strengthen the case for adoption.
Load-bearing premise
The production traces accurately represent the real-time power request patterns, hierarchical bottleneck locations, and tenant constraints encountered in actual live operation.
What would settle it
Running nvPAX live on a production datacenter and observing a satisfaction ratio well below 98.92% or frequent failures to resolve hierarchical bottlenecks would indicate the simulation results do not hold in practice.
Figures
read the original abstract
Power oversubscription is increasingly central to datacenter operation as power density grows, making it necessary to dynamically allocate limited power budgets across devices based on real-time demand. Existing approaches typically assume flat power domains, whereas in practice power distribution is hierarchical and allocation decisions must additionally respect tenant-level contractual constraints. We present nvPAX, a constrained-optimization policy that computes feasible power allocations at every control step via a three-phase hybrid QP/LP procedure. Phase I allocates power with minimum deviation from each device's power request, while respecting job priorities. Phase II fairly distributes excess power among active devices. Phase III fairly distributes any remaining power to idle devices. The rationale behind the three phases is to allow power oversubscription while maximizing datacenter utilization. On a trace-driven large-scale simulation using GPU power telemetry from a production datacenter, nvPAX runs with a mean wall-clock time of 264.69 ms per allocation interval and achieves a mean satisfaction ratio of 98.92%, outperforming static equal-share allocation and providing robustness beyond greedy proportional allocation in the presence of non-uniform hierarchical bottlenecks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces nvPAX, a constrained-optimization policy for dynamic power allocation under power oversubscription in hierarchical, multi-tenant datacenters. It employs a three-phase hybrid QP/LP procedure: Phase I minimizes deviation from device power requests while respecting job priorities; Phase II distributes excess power fairly among active devices; Phase III allocates any remainder to idle devices. The method is evaluated exclusively via trace-driven large-scale simulation on production GPU power telemetry, reporting a mean wall-clock time of 264.69 ms per interval and a mean satisfaction ratio of 98.92%, with claims of outperformance over static equal-share and greedy proportional baselines in the presence of non-uniform hierarchical bottlenecks.
Significance. If the simulation results prove robust, nvPAX would offer a practical, optimization-based alternative for improving datacenter power utilization while enforcing hierarchical limits and tenant contracts. The use of real production traces provides a concrete baseline for comparison, and the three-phase structure directly addresses oversubscription feasibility.
major comments (3)
- [Evaluation] Evaluation section: All performance numbers (98.92% mean satisfaction ratio, 264.69 ms runtime, outperformance claims) rest solely on replay of the given production traces. No sensitivity analysis, injected non-stationarities, or alternative bottleneck configurations are reported, leaving the robustness claim load-bearing on untested trace fidelity to live hierarchical and tenant patterns.
- [Method] Method description: The abstract and high-level procedure description supply no explicit mathematical formulation of the QP/LP objectives or constraints (e.g., no definition of the deviation objective, priority weights, or hierarchical bottleneck inequalities), preventing verification that the three phases are free of post-hoc adjustments or circular feasibility assumptions.
- [Evaluation] Evaluation section: The 98.92% satisfaction ratio is reported as a single mean without error bars, variance across traces, or per-tenant breakdowns, which undermines quantitative comparison to the baselines under non-uniform conditions.
minor comments (2)
- [Method] Clarify the exact encoding of tenant contractual constraints within the optimization phases and whether they are treated as hard or soft constraints.
- [Method] Add a brief complexity analysis or scaling discussion for the QP/LP solver as the number of devices and hierarchy depth increases.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and indicating planned revisions to strengthen the paper.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: All performance numbers (98.92% mean satisfaction ratio, 264.69 ms runtime, outperformance claims) rest solely on replay of the given production traces. No sensitivity analysis, injected non-stationarities, or alternative bottleneck configurations are reported, leaving the robustness claim load-bearing on untested trace fidelity to live hierarchical and tenant patterns.
Authors: We agree that relying solely on replay of the provided production traces limits the demonstration of robustness. The traces are from a real production datacenter and capture non-uniform hierarchical bottlenecks and tenant patterns, which is why we chose them for evaluation. However, to address this concern, in the revised manuscript we will add a sensitivity analysis section. This will include experiments with injected non-stationarities (e.g., sudden demand spikes) and alternative bottleneck configurations by varying the hierarchical power limits. We believe this will better support the robustness claims. revision: yes
-
Referee: [Method] Method description: The abstract and high-level procedure description supply no explicit mathematical formulation of the QP/LP objectives or constraints (e.g., no definition of the deviation objective, priority weights, or hierarchical bottleneck inequalities), preventing verification that the three phases are free of post-hoc adjustments or circular feasibility assumptions.
Authors: The full manuscript in Section 3 provides the explicit mathematical formulations: the Phase I QP minimizes the sum of squared deviations weighted by job priorities subject to hierarchical power sum constraints and tenant SLAs; Phases II and III are LPs for fair distribution of excess and idle power. We acknowledge that the abstract and high-level overview in the introduction could better reference these. In the revision, we will add a brief mathematical summary to the abstract and ensure the procedure description points to the equations for clarity, allowing verification of the phases. revision: partial
-
Referee: [Evaluation] Evaluation section: The 98.92% satisfaction ratio is reported as a single mean without error bars, variance across traces, or per-tenant breakdowns, which undermines quantitative comparison to the baselines under non-uniform conditions.
Authors: We concur that a single mean value is insufficient for rigorous comparison. The revised manuscript will report the standard deviation and error bars for the satisfaction ratio across the trace intervals and traces. Additionally, we will include per-tenant satisfaction breakdowns and comparisons to baselines under different hierarchical conditions to highlight performance under non-uniformity. revision: yes
Circularity Check
No circularity detected; performance evaluated on independent external traces
full rationale
The paper proposes nvPAX as a three-phase hybrid QP/LP constrained optimization procedure for hierarchical power allocation. All reported performance numbers (mean satisfaction ratio of 98.92%, wall-clock time of 264.69 ms, and outperformance versus baselines) are computed by replaying the algorithm on production GPU telemetry traces collected from a real datacenter. These metrics are defined externally from the trace data and allocation feasibility outcomes rather than being constructed from the procedure's own fitted parameters, priorities, or internal variables. No self-definitional reductions, fitted inputs relabeled as predictions, uniqueness theorems, or load-bearing self-citations appear in the derivation or evaluation chain. The work is therefore self-contained as an algorithmic method whose claims rest on independent input data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Carbon explorer: A holistic framework for designing carbon aware datacenters
Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Udit Gupta, Manoj Chakkar- avarthy, David Brooks, and Carole-Jean Wu. Carbon explorer: A holistic framework for designing carbon aware datacenters. InProceedings of the 28th ACM International Confer- ence on Architectural Support for Programming Languages and Operating Systems, V olume 2, ASPLOS 20...
work page 2023
-
[2]
Data center scale prediction-based power reservation steering, 2024
Nir Arad, Hadar Sivan, Gil Levy, Sridutt Bhalachandra, Larry Dennison, and Shie Mannor. Data center scale prediction-based power reservation steering, 2024. U.S. Patent Application No. 134580-1109 (NVD-109US), pending
work page 2024
-
[3]
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines.Synthesis lectures on computer architecture, 8(3):1–154, 2013
work page 2013
-
[4]
Apollo: Scalable and coordinated scheduling for Cloud-Scale computing
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for Cloud-Scale computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285–300, Broomfield, CO, October 2014. USENIX Association
work page 2014
-
[5]
Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. Firma- ment: Fast, centralized cluster scheduling at scale. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 99–115, Savannah, GA, November 2016. USENIX Association
work page 2016
-
[6]
Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo
Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. Tiresias: A GPU cluster manager for distributed deep learning. In16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 485–500, Boston, MA, February 2019. USENIX Association
work page 2019
-
[7]
Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning
Brian Guenter, Navendu Jain, and Charles Williams. Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning. In2011 Proceedings IEEE INFOCOM, pages 1332–1340, 2011
work page 2011
- [8]
-
[9]
Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads
Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 947–960, Renton, W A, July 2019. USENIX Association
work page 2019
-
[10]
Hussain Kahil, Shiva Sharma, Petri Välisuo, and Mohammed Elmusrati. Reinforcement learning for data center energy efficiency optimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025
work page 2025
-
[11]
Tullsen, and Tajana Simunic Rosing
Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Sampson, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Tajana Simunic Rosing. Managing distributed ups energy for effective power capping in data centers. In2012 39th Annual International Symposium on Computer Architecture (ISCA), pages 488–499, 2012. 13
work page 2012
-
[12]
Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini
Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A. Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini. Prediction-Based power oversubscription in cloud platforms. In2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 473–487. USENIX Association, July 2021
work page 2021
-
[13]
Thunderbolt: Throughput-Optimized, Quality-of-Service- Aware power capping at scale
Shaohong Li, Xi Wang, Xiao Zhang, Vasileios Kontorinis, Sreekumar Kodakara, David Lo, and Parthasarathy Ranganathan. Thunderbolt: Throughput-Optimized, Quality-of-Service- Aware power capping at scale. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 1241–1255. USENIX Association, November 2020
work page 2020
-
[14]
Themis: Fair and efficient GPU cluster scheduling
Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. Themis: Fair and efficient GPU cluster scheduling. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 289–304, Santa Clara, CA, February 2020. USENIX Association
work page 2020
-
[15]
Scalable data center power management via a global stress signal
Daniel Miller, Neal Master, Zhengyuan Zhou, and Nicholas Bambos. Scalable data center power management via a global stress signal. In2015 IEEE Global Communications Conference (GLOBECOM), pages 1–7, 2015
work page 2015
-
[16]
Kinetic power capping using fuzzy logic-based dynamic system prioritization
Rishi Mukherjee, Shivendra Katiyar, Lori Lynn Matthews, and Elie Antoun Jreij. Kinetic power capping using fuzzy logic-based dynamic system prioritization. U.S. Patent US20240126360A1,
-
[17]
Heterogeneity-Aware cluster scheduling policies for deep learning workloads
Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. Heterogeneity-Aware cluster scheduling policies for deep learning workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 481–498. USENIX Association, November 2020
work page 2020
-
[18]
Datacenter energy optimized power profiles, 2025
Sreedhar Narayanaswamy, Pratikkumar Dilipkumar Patel, Ian Karlin, Apoorv Gupta, Sudhir Saripalli, and Janey Guo. Datacenter energy optimized power profiles, 2025
work page 2025
-
[19]
Nvidia domain power service (dps), 2026
NVIDIA Corporation. Nvidia domain power service (dps), 2026
work page 2026
-
[20]
Polca: Power oversubscription in llm cloud providers.arXiv preprint arXiv:2308.12908, 2023
Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Polca: Power oversubscription in llm cloud providers.arXiv preprint arXiv:2308.12908, 2023
-
[21]
Characterizing power management opportunities for llms in the cloud
Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Characterizing power management opportunities for llms in the cloud. Proceedings of the 29th ACM International Conference on Architectural Support for Program- ming Languages and Operating Systems, V olume 3, 2024
work page 2024
-
[22]
Steven Pelley, David Meisner, Pooya Zandevakili, Thomas F. Wenisch, and Jack Underwood. Power routing: dynamic power provisioning in the data center. InProceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV , page 231–242, New York, NY , USA, 2010. Association for Computing...
work page 2010
-
[23]
Optimus: an efficient dynamic resource scheduler for deep learning clusters
Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. Optimus: an efficient dynamic resource scheduler for deep learning clusters. InProceedings of the Thirteenth EuroSys Conference, EuroSys ’18, New York, NY , USA, 2018. Association for Computing Machinery
work page 2018
-
[24]
Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pages 1–18. USENIX Association, July 2021
work page 2021
-
[25]
Parm: Adaptive resource allocation for datacenter power capping
Haoran Qiu, Linghao Zhang, Chen Wang2 Hubertus Franke, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. Parm: Adaptive resource allocation for datacenter power capping. In Machine Learning for Systems Workshop at the Annual Conference on Neural Information Processing Systems (NeurIPS 2023), 2023. 14
work page 2023
-
[26]
Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne
Ana Radovanovic, Ross Koningstein, Ian Schneider, Bokan Chen, Alexandre Nobrega Duarte, Binz Roy, Diyue Xiao, Maya Haridasan, Patrick Hung, Nick Care, Saurav Talukdar, E. Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne. Carbon-aware computing for datacen- ters.IEEE Transactions on Power Systems, 38:1270–1280, 2021
work page 2021
-
[27]
Data center power oversubscription with a medium voltage power plane and priority-aware capping
Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan. Data center power oversubscription with a medium voltage power plane and priority-aware capping. InProceedings of the 25th International Conference on Architect...
work page 2020
-
[28]
Tapas: Thermal-and power-aware scheduling for llm inference in cloud platforms
Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. Tapas: Thermal-and power-aware scheduling for llm inference in cloud platforms. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume 2, pages 126...
work page 2025
-
[29]
Xiaorui Wang, Ming Chen, Charles Lefurgy, and Tom W. Keller. Ship: Scalable hierarchical power control for large-scale data centers. In2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 91–100, 2009
work page 2009
-
[30]
Transparent GPU sharing in container clouds for deep learning workloads
Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, and Xin Jin. Transparent GPU sharing in container clouds for deep learning workloads. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 69–85, Boston, MA, April 2023. USENIX Association
work page 2023
-
[31]
Dynamo: Facebook’s data center-wide power management system
Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. Dynamo: Facebook’s data center-wide power management system. In2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 469–480, 2016
work page 2016
-
[32]
Gandiva: Introspective cluster scheduling for deep learning
Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 595–610, Carlsbad, CA, October ...
work page 2018
-
[33]
Shock- wave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning
Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, and Aditya Akella. Shock- wave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 703–723, Boston, MA, April 2023. USENIX Association. 15 A Greedy Proportional Allocation vs....
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.