pith. sign in

arxiv: 2605.01837 · v1 · submitted 2026-05-03 · 💻 cs.DC · cs.SY· eess.SY· math.OC

nvPAX: Constrained Optimization for Dynamic Power Allocation in Hierarchical and Multi-Tenant Systems

Pith reviewed 2026-05-09 16:13 UTC · model grok-4.3

classification 💻 cs.DC cs.SYeess.SYmath.OC
keywords power allocationdatacenter optimizationconstrained optimizationhierarchical powermulti-tenant systemsdynamic allocationGPU power managementpower oversubscription
0
0 comments X

The pith

nvPAX allocates power in hierarchical multi-tenant datacenters using three-phase constrained optimization to reach 98.92% satisfaction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces nvPAX for dynamically allocating limited power in datacenters where distribution is hierarchical and tenants impose contractual constraints. It employs a hybrid quadratic and linear programming method in three sequential phases to first minimize deviations from device power requests while respecting priorities, then distribute excess power fairly to active devices, and finally to idle devices. This design supports power oversubscription to increase overall utilization. Trace-driven simulations from a production datacenter show the policy executes in a mean of 264.69 milliseconds per interval and attains a mean satisfaction ratio of 98.92 percent, surpassing static equal-share and greedy proportional allocations especially under uneven bottlenecks. Sympathetic readers would value this because growing power densities require smarter allocation to avoid waste and violations.

Core claim

nvPAX is a constrained-optimization policy that computes feasible power allocations at every control step via a three-phase hybrid QP/LP procedure. Phase I allocates power with minimum deviation from each device's power request, while respecting job priorities. Phase II fairly distributes excess power among active devices. Phase III fairly distributes any remaining power to idle devices. The rationale is to allow power oversubscription while maximizing datacenter utilization. On a trace-driven large-scale simulation using GPU power telemetry from a production datacenter, nvPAX runs with a mean wall-clock time of 264.69 ms per allocation interval and achieves a mean satisfaction ratio of 98.9

What carries the argument

The three-phase hybrid QP/LP procedure for computing allocations that respects hierarchies, priorities, and tenant constraints.

If this is right

  • Dynamic power allocation becomes practical in real-time for hierarchical systems without violating tenant contracts frequently.
  • Datacenter operators can oversubscribe power more aggressively while maintaining high utilization and satisfaction.
  • The method provides better robustness than static or simple greedy approaches when power bottlenecks vary across the hierarchy.
  • Allocation intervals can be handled at sub-second speeds suitable for ongoing operation.
  • Multi-tenant environments gain a tool to balance contractual obligations with overall efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar phased optimization could apply to allocating other constrained resources like network bandwidth or storage in multi-tenant clouds.
  • Live deployment might reveal opportunities to integrate with predictive models for demand forecasting.
  • Extending the phases to include energy efficiency metrics could further reduce operational costs.
  • Validation across different datacenter scales or hardware types would strengthen the case for adoption.

Load-bearing premise

The production traces accurately represent the real-time power request patterns, hierarchical bottleneck locations, and tenant constraints encountered in actual live operation.

What would settle it

Running nvPAX live on a production datacenter and observing a satisfaction ratio well below 98.92% or frequent failures to resolve hierarchical bottlenecks would indicate the simulation results do not hold in practice.

Figures

Figures reproduced from arXiv: 2605.01837 by Gil Shabat, Hadar Sivan, Yoel Shkolnisky.

Figure 1
Figure 1. Figure 1: Datacenter power and tenant layout illustrating hierarchical oversubscription. The tree view at source ↗
Figure 2
Figure 2. Figure 2: nvPAX vs. Static equal-share allocation. Left: satisfaction ratio. Right: relative utilization view at source ↗
Figure 3
Figure 3. Figure 3: Empirical scaling of nvPAX’s optimization time on synthetic random hierarchies. view at source ↗
Figure 4
Figure 4. Figure 4: Non-uniform hierarchy example. Rack A has a tight internal server view at source ↗
read the original abstract

Power oversubscription is increasingly central to datacenter operation as power density grows, making it necessary to dynamically allocate limited power budgets across devices based on real-time demand. Existing approaches typically assume flat power domains, whereas in practice power distribution is hierarchical and allocation decisions must additionally respect tenant-level contractual constraints. We present nvPAX, a constrained-optimization policy that computes feasible power allocations at every control step via a three-phase hybrid QP/LP procedure. Phase I allocates power with minimum deviation from each device's power request, while respecting job priorities. Phase II fairly distributes excess power among active devices. Phase III fairly distributes any remaining power to idle devices. The rationale behind the three phases is to allow power oversubscription while maximizing datacenter utilization. On a trace-driven large-scale simulation using GPU power telemetry from a production datacenter, nvPAX runs with a mean wall-clock time of 264.69 ms per allocation interval and achieves a mean satisfaction ratio of 98.92%, outperforming static equal-share allocation and providing robustness beyond greedy proportional allocation in the presence of non-uniform hierarchical bottlenecks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces nvPAX, a constrained-optimization policy for dynamic power allocation under power oversubscription in hierarchical, multi-tenant datacenters. It employs a three-phase hybrid QP/LP procedure: Phase I minimizes deviation from device power requests while respecting job priorities; Phase II distributes excess power fairly among active devices; Phase III allocates any remainder to idle devices. The method is evaluated exclusively via trace-driven large-scale simulation on production GPU power telemetry, reporting a mean wall-clock time of 264.69 ms per interval and a mean satisfaction ratio of 98.92%, with claims of outperformance over static equal-share and greedy proportional baselines in the presence of non-uniform hierarchical bottlenecks.

Significance. If the simulation results prove robust, nvPAX would offer a practical, optimization-based alternative for improving datacenter power utilization while enforcing hierarchical limits and tenant contracts. The use of real production traces provides a concrete baseline for comparison, and the three-phase structure directly addresses oversubscription feasibility.

major comments (3)
  1. [Evaluation] Evaluation section: All performance numbers (98.92% mean satisfaction ratio, 264.69 ms runtime, outperformance claims) rest solely on replay of the given production traces. No sensitivity analysis, injected non-stationarities, or alternative bottleneck configurations are reported, leaving the robustness claim load-bearing on untested trace fidelity to live hierarchical and tenant patterns.
  2. [Method] Method description: The abstract and high-level procedure description supply no explicit mathematical formulation of the QP/LP objectives or constraints (e.g., no definition of the deviation objective, priority weights, or hierarchical bottleneck inequalities), preventing verification that the three phases are free of post-hoc adjustments or circular feasibility assumptions.
  3. [Evaluation] Evaluation section: The 98.92% satisfaction ratio is reported as a single mean without error bars, variance across traces, or per-tenant breakdowns, which undermines quantitative comparison to the baselines under non-uniform conditions.
minor comments (2)
  1. [Method] Clarify the exact encoding of tenant contractual constraints within the optimization phases and whether they are treated as hard or soft constraints.
  2. [Method] Add a brief complexity analysis or scaling discussion for the QP/LP solver as the number of devices and hierarchy depth increases.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and indicating planned revisions to strengthen the paper.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: All performance numbers (98.92% mean satisfaction ratio, 264.69 ms runtime, outperformance claims) rest solely on replay of the given production traces. No sensitivity analysis, injected non-stationarities, or alternative bottleneck configurations are reported, leaving the robustness claim load-bearing on untested trace fidelity to live hierarchical and tenant patterns.

    Authors: We agree that relying solely on replay of the provided production traces limits the demonstration of robustness. The traces are from a real production datacenter and capture non-uniform hierarchical bottlenecks and tenant patterns, which is why we chose them for evaluation. However, to address this concern, in the revised manuscript we will add a sensitivity analysis section. This will include experiments with injected non-stationarities (e.g., sudden demand spikes) and alternative bottleneck configurations by varying the hierarchical power limits. We believe this will better support the robustness claims. revision: yes

  2. Referee: [Method] Method description: The abstract and high-level procedure description supply no explicit mathematical formulation of the QP/LP objectives or constraints (e.g., no definition of the deviation objective, priority weights, or hierarchical bottleneck inequalities), preventing verification that the three phases are free of post-hoc adjustments or circular feasibility assumptions.

    Authors: The full manuscript in Section 3 provides the explicit mathematical formulations: the Phase I QP minimizes the sum of squared deviations weighted by job priorities subject to hierarchical power sum constraints and tenant SLAs; Phases II and III are LPs for fair distribution of excess and idle power. We acknowledge that the abstract and high-level overview in the introduction could better reference these. In the revision, we will add a brief mathematical summary to the abstract and ensure the procedure description points to the equations for clarity, allowing verification of the phases. revision: partial

  3. Referee: [Evaluation] Evaluation section: The 98.92% satisfaction ratio is reported as a single mean without error bars, variance across traces, or per-tenant breakdowns, which undermines quantitative comparison to the baselines under non-uniform conditions.

    Authors: We concur that a single mean value is insufficient for rigorous comparison. The revised manuscript will report the standard deviation and error bars for the satisfaction ratio across the trace intervals and traces. Additionally, we will include per-tenant satisfaction breakdowns and comparisons to baselines under different hierarchical conditions to highlight performance under non-uniformity. revision: yes

Circularity Check

0 steps flagged

No circularity detected; performance evaluated on independent external traces

full rationale

The paper proposes nvPAX as a three-phase hybrid QP/LP constrained optimization procedure for hierarchical power allocation. All reported performance numbers (mean satisfaction ratio of 98.92%, wall-clock time of 264.69 ms, and outperformance versus baselines) are computed by replaying the algorithm on production GPU telemetry traces collected from a real datacenter. These metrics are defined externally from the trace data and allocation feasibility outcomes rather than being constructed from the procedure's own fitted parameters, priorities, or internal variables. No self-definitional reductions, fitted inputs relabeled as predictions, uniqueness theorems, or load-bearing self-citations appear in the derivation or evaluation chain. The work is therefore self-contained as an algorithmic method whose claims rest on independent input data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method relies on standard QP/LP solvers and the assumption that system state (requests, hierarchy, priorities) is known at each step.

pith-pipeline@v0.9.0 · 5509 in / 1174 out tokens · 76597 ms · 2026-05-09T16:13:41.693551+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Carbon explorer: A holistic framework for designing carbon aware datacenters

    Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Udit Gupta, Manoj Chakkar- avarthy, David Brooks, and Carole-Jean Wu. Carbon explorer: A holistic framework for designing carbon aware datacenters. InProceedings of the 28th ACM International Confer- ence on Architectural Support for Programming Languages and Operating Systems, V olume 2, ASPLOS 20...

  2. [2]

    Data center scale prediction-based power reservation steering, 2024

    Nir Arad, Hadar Sivan, Gil Levy, Sridutt Bhalachandra, Larry Dennison, and Shie Mannor. Data center scale prediction-based power reservation steering, 2024. U.S. Patent Application No. 134580-1109 (NVD-109US), pending

  3. [3]

    The datacenter as a computer: An introduction to the design of warehouse-scale machines.Synthesis lectures on computer architecture, 8(3):1–154, 2013

    Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines.Synthesis lectures on computer architecture, 8(3):1–154, 2013

  4. [4]

    Apollo: Scalable and coordinated scheduling for Cloud-Scale computing

    Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for Cloud-Scale computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285–300, Broomfield, CO, October 2014. USENIX Association

  5. [5]

    Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. Firma- ment: Fast, centralized cluster scheduling at scale. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 99–115, Savannah, GA, November 2016. USENIX Association

  6. [6]

    Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo

    Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. Tiresias: A GPU cluster manager for distributed deep learning. In16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 485–500, Boston, MA, February 2019. USENIX Association

  7. [7]

    Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning

    Brian Guenter, Navendu Jain, and Charles Williams. Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning. In2011 Proceedings IEEE INFOCOM, pages 1332–1340, 2011

  8. [8]

    Gurobi optimizer, 2026

    Gurobi Optimization, LLC. Gurobi optimizer, 2026

  9. [9]

    Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads

    Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 947–960, Renton, W A, July 2019. USENIX Association

  10. [10]

    Reinforcement learning for data center energy efficiency optimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025

    Hussain Kahil, Shiva Sharma, Petri Välisuo, and Mohammed Elmusrati. Reinforcement learning for data center energy efficiency optimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025

  11. [11]

    Tullsen, and Tajana Simunic Rosing

    Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Sampson, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Tajana Simunic Rosing. Managing distributed ups energy for effective power capping in data centers. In2012 39th Annual International Symposium on Computer Architecture (ISCA), pages 488–499, 2012. 13

  12. [12]

    Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini

    Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A. Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini. Prediction-Based power oversubscription in cloud platforms. In2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 473–487. USENIX Association, July 2021

  13. [13]

    Thunderbolt: Throughput-Optimized, Quality-of-Service- Aware power capping at scale

    Shaohong Li, Xi Wang, Xiao Zhang, Vasileios Kontorinis, Sreekumar Kodakara, David Lo, and Parthasarathy Ranganathan. Thunderbolt: Throughput-Optimized, Quality-of-Service- Aware power capping at scale. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 1241–1255. USENIX Association, November 2020

  14. [14]

    Themis: Fair and efficient GPU cluster scheduling

    Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. Themis: Fair and efficient GPU cluster scheduling. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 289–304, Santa Clara, CA, February 2020. USENIX Association

  15. [15]

    Scalable data center power management via a global stress signal

    Daniel Miller, Neal Master, Zhengyuan Zhou, and Nicholas Bambos. Scalable data center power management via a global stress signal. In2015 IEEE Global Communications Conference (GLOBECOM), pages 1–7, 2015

  16. [16]

    Kinetic power capping using fuzzy logic-based dynamic system prioritization

    Rishi Mukherjee, Shivendra Katiyar, Lori Lynn Matthews, and Elie Antoun Jreij. Kinetic power capping using fuzzy logic-based dynamic system prioritization. U.S. Patent US20240126360A1,

  17. [17]

    Heterogeneity-Aware cluster scheduling policies for deep learning workloads

    Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. Heterogeneity-Aware cluster scheduling policies for deep learning workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 481–498. USENIX Association, November 2020

  18. [18]

    Datacenter energy optimized power profiles, 2025

    Sreedhar Narayanaswamy, Pratikkumar Dilipkumar Patel, Ian Karlin, Apoorv Gupta, Sudhir Saripalli, and Janey Guo. Datacenter energy optimized power profiles, 2025

  19. [19]

    Nvidia domain power service (dps), 2026

    NVIDIA Corporation. Nvidia domain power service (dps), 2026

  20. [20]

    Polca: Power oversubscription in llm cloud providers.arXiv preprint arXiv:2308.12908, 2023

    Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Polca: Power oversubscription in llm cloud providers.arXiv preprint arXiv:2308.12908, 2023

  21. [21]

    Characterizing power management opportunities for llms in the cloud

    Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Characterizing power management opportunities for llms in the cloud. Proceedings of the 29th ACM International Conference on Architectural Support for Program- ming Languages and Operating Systems, V olume 3, 2024

  22. [22]

    Wenisch, and Jack Underwood

    Steven Pelley, David Meisner, Pooya Zandevakili, Thomas F. Wenisch, and Jack Underwood. Power routing: dynamic power provisioning in the data center. InProceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV , page 231–242, New York, NY , USA, 2010. Association for Computing...

  23. [23]

    Optimus: an efficient dynamic resource scheduler for deep learning clusters

    Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. Optimus: an efficient dynamic resource scheduler for deep learning clusters. InProceedings of the Thirteenth EuroSys Conference, EuroSys ’18, New York, NY , USA, 2018. Association for Computing Machinery

  24. [24]

    Ganger, and Eric P

    Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pages 1–18. USENIX Association, July 2021

  25. [25]

    Parm: Adaptive resource allocation for datacenter power capping

    Haoran Qiu, Linghao Zhang, Chen Wang2 Hubertus Franke, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. Parm: Adaptive resource allocation for datacenter power capping. In Machine Learning for Systems Workshop at the Annual Conference on Neural Information Processing Systems (NeurIPS 2023), 2023. 14

  26. [26]

    Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne

    Ana Radovanovic, Ross Koningstein, Ian Schneider, Bokan Chen, Alexandre Nobrega Duarte, Binz Roy, Diyue Xiao, Maya Haridasan, Patrick Hung, Nick Care, Saurav Talukdar, E. Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne. Carbon-aware computing for datacen- ters.IEEE Transactions on Power Systems, 38:1270–1280, 2021

  27. [27]

    Data center power oversubscription with a medium voltage power plane and priority-aware capping

    Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan. Data center power oversubscription with a medium voltage power plane and priority-aware capping. InProceedings of the 25th International Conference on Architect...

  28. [28]

    Tapas: Thermal-and power-aware scheduling for llm inference in cloud platforms

    Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. Tapas: Thermal-and power-aware scheduling for llm inference in cloud platforms. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume 2, pages 126...

  29. [29]

    Xiaorui Wang, Ming Chen, Charles Lefurgy, and Tom W. Keller. Ship: Scalable hierarchical power control for large-scale data centers. In2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 91–100, 2009

  30. [30]

    Transparent GPU sharing in container clouds for deep learning workloads

    Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, and Xin Jin. Transparent GPU sharing in container clouds for deep learning workloads. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 69–85, Boston, MA, April 2023. USENIX Association

  31. [31]

    Dynamo: Facebook’s data center-wide power management system

    Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. Dynamo: Facebook’s data center-wide power management system. In2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 469–480, 2016

  32. [32]

    Gandiva: Introspective cluster scheduling for deep learning

    Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 595–610, Carlsbad, CA, October ...

  33. [33]

    Shock- wave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning

    Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, and Aditya Akella. Shock- wave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 703–723, Boston, MA, April 2023. USENIX Association. 15 A Greedy Proportional Allocation vs....