nvPAX: Constrained Optimization for Dynamic Power Allocation in Hierarchical and Multi-Tenant Systems

Gil Shabat; Hadar Sivan; Yoel Shkolnisky

arxiv: 2605.01837 · v1 · submitted 2026-05-03 · 💻 cs.DC · cs.SY· eess.SY· math.OC

nvPAX: Constrained Optimization for Dynamic Power Allocation in Hierarchical and Multi-Tenant Systems

Hadar Sivan , Gil Shabat , Yoel Shkolnisky This is my paper

Pith reviewed 2026-05-09 16:13 UTC · model grok-4.3

classification 💻 cs.DC cs.SYeess.SYmath.OC

keywords power allocationdatacenter optimizationconstrained optimizationhierarchical powermulti-tenant systemsdynamic allocationGPU power managementpower oversubscription

0 comments

The pith

nvPAX allocates power in hierarchical multi-tenant datacenters using three-phase constrained optimization to reach 98.92% satisfaction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces nvPAX for dynamically allocating limited power in datacenters where distribution is hierarchical and tenants impose contractual constraints. It employs a hybrid quadratic and linear programming method in three sequential phases to first minimize deviations from device power requests while respecting priorities, then distribute excess power fairly to active devices, and finally to idle devices. This design supports power oversubscription to increase overall utilization. Trace-driven simulations from a production datacenter show the policy executes in a mean of 264.69 milliseconds per interval and attains a mean satisfaction ratio of 98.92 percent, surpassing static equal-share and greedy proportional allocations especially under uneven bottlenecks. Sympathetic readers would value this because growing power densities require smarter allocation to avoid waste and violations.

Core claim

nvPAX is a constrained-optimization policy that computes feasible power allocations at every control step via a three-phase hybrid QP/LP procedure. Phase I allocates power with minimum deviation from each device's power request, while respecting job priorities. Phase II fairly distributes excess power among active devices. Phase III fairly distributes any remaining power to idle devices. The rationale is to allow power oversubscription while maximizing datacenter utilization. On a trace-driven large-scale simulation using GPU power telemetry from a production datacenter, nvPAX runs with a mean wall-clock time of 264.69 ms per allocation interval and achieves a mean satisfaction ratio of 98.9

What carries the argument

The three-phase hybrid QP/LP procedure for computing allocations that respects hierarchies, priorities, and tenant constraints.

If this is right

Dynamic power allocation becomes practical in real-time for hierarchical systems without violating tenant contracts frequently.
Datacenter operators can oversubscribe power more aggressively while maintaining high utilization and satisfaction.
The method provides better robustness than static or simple greedy approaches when power bottlenecks vary across the hierarchy.
Allocation intervals can be handled at sub-second speeds suitable for ongoing operation.
Multi-tenant environments gain a tool to balance contractual obligations with overall efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar phased optimization could apply to allocating other constrained resources like network bandwidth or storage in multi-tenant clouds.
Live deployment might reveal opportunities to integrate with predictive models for demand forecasting.
Extending the phases to include energy efficiency metrics could further reduce operational costs.
Validation across different datacenter scales or hardware types would strengthen the case for adoption.

Load-bearing premise

The production traces accurately represent the real-time power request patterns, hierarchical bottleneck locations, and tenant constraints encountered in actual live operation.

What would settle it

Running nvPAX live on a production datacenter and observing a satisfaction ratio well below 98.92% or frequent failures to resolve hierarchical bottlenecks would indicate the simulation results do not hold in practice.

Figures

Figures reproduced from arXiv: 2605.01837 by Gil Shabat, Hadar Sivan, Yoel Shkolnisky.

**Figure 1.** Figure 1: Datacenter power and tenant layout illustrating hierarchical oversubscription. The tree view at source ↗

**Figure 2.** Figure 2: nvPAX vs. Static equal-share allocation. Left: satisfaction ratio. Right: relative utilization view at source ↗

**Figure 3.** Figure 3: Empirical scaling of nvPAX’s optimization time on synthetic random hierarchies. view at source ↗

**Figure 4.** Figure 4: Non-uniform hierarchy example. Rack A has a tight internal server view at source ↗

read the original abstract

Power oversubscription is increasingly central to datacenter operation as power density grows, making it necessary to dynamically allocate limited power budgets across devices based on real-time demand. Existing approaches typically assume flat power domains, whereas in practice power distribution is hierarchical and allocation decisions must additionally respect tenant-level contractual constraints. We present nvPAX, a constrained-optimization policy that computes feasible power allocations at every control step via a three-phase hybrid QP/LP procedure. Phase I allocates power with minimum deviation from each device's power request, while respecting job priorities. Phase II fairly distributes excess power among active devices. Phase III fairly distributes any remaining power to idle devices. The rationale behind the three phases is to allow power oversubscription while maximizing datacenter utilization. On a trace-driven large-scale simulation using GPU power telemetry from a production datacenter, nvPAX runs with a mean wall-clock time of 264.69 ms per allocation interval and achieves a mean satisfaction ratio of 98.92%, outperforming static equal-share allocation and providing robustness beyond greedy proportional allocation in the presence of non-uniform hierarchical bottlenecks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

nvPAX provides a concrete three-phase optimization for hierarchical power allocation that performs well in trace simulations but needs more robustness testing to support its claims.

read the letter

nvPAX gives a practical three-phase optimization procedure for allocating power in hierarchical, multi-tenant datacenters. It uses a mix of quadratic and linear programming to first meet requests as closely as possible under priorities, then share any extra power fairly among active devices, and finally among idle ones. The simulation on real production traces shows it runs in under 300 ms on average and reaches about 99% satisfaction, beating simpler allocation schemes. This is new because most existing power allocation work assumes flat domains without tenant contracts. The authors have built something that respects the actual structure of power distribution and contractual limits, which matters as power density increases and oversubscription becomes necessary. The paper does a decent job describing the rationale for the three phases and showing how it improves utilization. The trace-driven results provide some evidence that it can work at scale. The main weakness is that the performance claims depend entirely on the fidelity of those production traces. There is no testing of how the method behaves if demand is burstier or if hierarchical bottlenecks are in different places than in the traces. Without that, it's hard to know if the high satisfaction ratio would hold in live operation. Also, the abstract leaves out the precise constraint definitions and problem formulations, which makes it tougher to verify the approach without digging into the full text. This paper is for systems researchers focused on datacenter resource management and power efficiency. Someone working on similar allocation problems would get concrete ideas from the phased solver and the constraint handling. It is solid enough to deserve a serious referee, as the core idea addresses a real operational need. I would recommend putting it through peer review, with the expectation that the authors add sensitivity analysis and perhaps some live validation or more detailed math in the revisions.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces nvPAX, a constrained-optimization policy for dynamic power allocation under power oversubscription in hierarchical, multi-tenant datacenters. It employs a three-phase hybrid QP/LP procedure: Phase I minimizes deviation from device power requests while respecting job priorities; Phase II distributes excess power fairly among active devices; Phase III allocates any remainder to idle devices. The method is evaluated exclusively via trace-driven large-scale simulation on production GPU power telemetry, reporting a mean wall-clock time of 264.69 ms per interval and a mean satisfaction ratio of 98.92%, with claims of outperformance over static equal-share and greedy proportional baselines in the presence of non-uniform hierarchical bottlenecks.

Significance. If the simulation results prove robust, nvPAX would offer a practical, optimization-based alternative for improving datacenter power utilization while enforcing hierarchical limits and tenant contracts. The use of real production traces provides a concrete baseline for comparison, and the three-phase structure directly addresses oversubscription feasibility.

major comments (3)

[Evaluation] Evaluation section: All performance numbers (98.92% mean satisfaction ratio, 264.69 ms runtime, outperformance claims) rest solely on replay of the given production traces. No sensitivity analysis, injected non-stationarities, or alternative bottleneck configurations are reported, leaving the robustness claim load-bearing on untested trace fidelity to live hierarchical and tenant patterns.
[Method] Method description: The abstract and high-level procedure description supply no explicit mathematical formulation of the QP/LP objectives or constraints (e.g., no definition of the deviation objective, priority weights, or hierarchical bottleneck inequalities), preventing verification that the three phases are free of post-hoc adjustments or circular feasibility assumptions.
[Evaluation] Evaluation section: The 98.92% satisfaction ratio is reported as a single mean without error bars, variance across traces, or per-tenant breakdowns, which undermines quantitative comparison to the baselines under non-uniform conditions.

minor comments (2)

[Method] Clarify the exact encoding of tenant contractual constraints within the optimization phases and whether they are treated as hard or soft constraints.
[Method] Add a brief complexity analysis or scaling discussion for the QP/LP solver as the number of devices and hierarchy depth increases.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and indicating planned revisions to strengthen the paper.

read point-by-point responses

Referee: [Evaluation] Evaluation section: All performance numbers (98.92% mean satisfaction ratio, 264.69 ms runtime, outperformance claims) rest solely on replay of the given production traces. No sensitivity analysis, injected non-stationarities, or alternative bottleneck configurations are reported, leaving the robustness claim load-bearing on untested trace fidelity to live hierarchical and tenant patterns.

Authors: We agree that relying solely on replay of the provided production traces limits the demonstration of robustness. The traces are from a real production datacenter and capture non-uniform hierarchical bottlenecks and tenant patterns, which is why we chose them for evaluation. However, to address this concern, in the revised manuscript we will add a sensitivity analysis section. This will include experiments with injected non-stationarities (e.g., sudden demand spikes) and alternative bottleneck configurations by varying the hierarchical power limits. We believe this will better support the robustness claims. revision: yes
Referee: [Method] Method description: The abstract and high-level procedure description supply no explicit mathematical formulation of the QP/LP objectives or constraints (e.g., no definition of the deviation objective, priority weights, or hierarchical bottleneck inequalities), preventing verification that the three phases are free of post-hoc adjustments or circular feasibility assumptions.

Authors: The full manuscript in Section 3 provides the explicit mathematical formulations: the Phase I QP minimizes the sum of squared deviations weighted by job priorities subject to hierarchical power sum constraints and tenant SLAs; Phases II and III are LPs for fair distribution of excess and idle power. We acknowledge that the abstract and high-level overview in the introduction could better reference these. In the revision, we will add a brief mathematical summary to the abstract and ensure the procedure description points to the equations for clarity, allowing verification of the phases. revision: partial
Referee: [Evaluation] Evaluation section: The 98.92% satisfaction ratio is reported as a single mean without error bars, variance across traces, or per-tenant breakdowns, which undermines quantitative comparison to the baselines under non-uniform conditions.

Authors: We concur that a single mean value is insufficient for rigorous comparison. The revised manuscript will report the standard deviation and error bars for the satisfaction ratio across the trace intervals and traces. Additionally, we will include per-tenant satisfaction breakdowns and comparisons to baselines under different hierarchical conditions to highlight performance under non-uniformity. revision: yes

Circularity Check

0 steps flagged

No circularity detected; performance evaluated on independent external traces

full rationale

The paper proposes nvPAX as a three-phase hybrid QP/LP constrained optimization procedure for hierarchical power allocation. All reported performance numbers (mean satisfaction ratio of 98.92%, wall-clock time of 264.69 ms, and outperformance versus baselines) are computed by replaying the algorithm on production GPU telemetry traces collected from a real datacenter. These metrics are defined externally from the trace data and allocation feasibility outcomes rather than being constructed from the procedure's own fitted parameters, priorities, or internal variables. No self-definitional reductions, fitted inputs relabeled as predictions, uniqueness theorems, or load-bearing self-citations appear in the derivation or evaluation chain. The work is therefore self-contained as an algorithmic method whose claims rest on independent input data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method relies on standard QP/LP solvers and the assumption that system state (requests, hierarchy, priorities) is known at each step.

pith-pipeline@v0.9.0 · 5509 in / 1174 out tokens · 76597 ms · 2026-05-09T16:13:41.693551+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Carbon explorer: A holistic framework for designing carbon aware datacenters

Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Udit Gupta, Manoj Chakkar- avarthy, David Brooks, and Carole-Jean Wu. Carbon explorer: A holistic framework for designing carbon aware datacenters. InProceedings of the 28th ACM International Confer- ence on Architectural Support for Programming Languages and Operating Systems, V olume 2, ASPLOS 20...

work page 2023
[2]

Data center scale prediction-based power reservation steering, 2024

Nir Arad, Hadar Sivan, Gil Levy, Sridutt Bhalachandra, Larry Dennison, and Shie Mannor. Data center scale prediction-based power reservation steering, 2024. U.S. Patent Application No. 134580-1109 (NVD-109US), pending

work page 2024
[3]

The datacenter as a computer: An introduction to the design of warehouse-scale machines.Synthesis lectures on computer architecture, 8(3):1–154, 2013

Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines.Synthesis lectures on computer architecture, 8(3):1–154, 2013

work page 2013
[4]

Apollo: Scalable and coordinated scheduling for Cloud-Scale computing

Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for Cloud-Scale computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285–300, Broomfield, CO, October 2014. USENIX Association

work page 2014
[5]

Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. Firma- ment: Fast, centralized cluster scheduling at scale. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 99–115, Savannah, GA, November 2016. USENIX Association

work page 2016
[6]

Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo

Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. Tiresias: A GPU cluster manager for distributed deep learning. In16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 485–500, Boston, MA, February 2019. USENIX Association

work page 2019
[7]

Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning

Brian Guenter, Navendu Jain, and Charles Williams. Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning. In2011 Proceedings IEEE INFOCOM, pages 1332–1340, 2011

work page 2011
[8]

Gurobi optimizer, 2026

Gurobi Optimization, LLC. Gurobi optimizer, 2026

work page 2026
[9]

Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads

Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 947–960, Renton, W A, July 2019. USENIX Association

work page 2019
[10]

Reinforcement learning for data center energy efficiency optimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025

Hussain Kahil, Shiva Sharma, Petri Välisuo, and Mohammed Elmusrati. Reinforcement learning for data center energy efficiency optimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025

work page 2025
[11]

Tullsen, and Tajana Simunic Rosing

Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Sampson, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Tajana Simunic Rosing. Managing distributed ups energy for effective power capping in data centers. In2012 39th Annual International Symposium on Computer Architecture (ISCA), pages 488–499, 2012. 13

work page 2012
[12]

Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini

Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A. Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini. Prediction-Based power oversubscription in cloud platforms. In2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 473–487. USENIX Association, July 2021

work page 2021
[13]

Thunderbolt: Throughput-Optimized, Quality-of-Service- Aware power capping at scale

Shaohong Li, Xi Wang, Xiao Zhang, Vasileios Kontorinis, Sreekumar Kodakara, David Lo, and Parthasarathy Ranganathan. Thunderbolt: Throughput-Optimized, Quality-of-Service- Aware power capping at scale. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 1241–1255. USENIX Association, November 2020

work page 2020
[14]

Themis: Fair and efficient GPU cluster scheduling

Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. Themis: Fair and efficient GPU cluster scheduling. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 289–304, Santa Clara, CA, February 2020. USENIX Association

work page 2020
[15]

Scalable data center power management via a global stress signal

Daniel Miller, Neal Master, Zhengyuan Zhou, and Nicholas Bambos. Scalable data center power management via a global stress signal. In2015 IEEE Global Communications Conference (GLOBECOM), pages 1–7, 2015

work page 2015
[16]

Kinetic power capping using fuzzy logic-based dynamic system prioritization

Rishi Mukherjee, Shivendra Katiyar, Lori Lynn Matthews, and Elie Antoun Jreij. Kinetic power capping using fuzzy logic-based dynamic system prioritization. U.S. Patent US20240126360A1,

work page
[17]

Heterogeneity-Aware cluster scheduling policies for deep learning workloads

Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. Heterogeneity-Aware cluster scheduling policies for deep learning workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 481–498. USENIX Association, November 2020

work page 2020
[18]

Datacenter energy optimized power profiles, 2025

Sreedhar Narayanaswamy, Pratikkumar Dilipkumar Patel, Ian Karlin, Apoorv Gupta, Sudhir Saripalli, and Janey Guo. Datacenter energy optimized power profiles, 2025

work page 2025
[19]

Nvidia domain power service (dps), 2026

NVIDIA Corporation. Nvidia domain power service (dps), 2026

work page 2026
[20]

Polca: Power oversubscription in llm cloud providers.arXiv preprint arXiv:2308.12908, 2023

Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Polca: Power oversubscription in llm cloud providers.arXiv preprint arXiv:2308.12908, 2023

work page arXiv 2023
[21]

Characterizing power management opportunities for llms in the cloud

Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Characterizing power management opportunities for llms in the cloud. Proceedings of the 29th ACM International Conference on Architectural Support for Program- ming Languages and Operating Systems, V olume 3, 2024

work page 2024
[22]

Wenisch, and Jack Underwood

Steven Pelley, David Meisner, Pooya Zandevakili, Thomas F. Wenisch, and Jack Underwood. Power routing: dynamic power provisioning in the data center. InProceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV , page 231–242, New York, NY , USA, 2010. Association for Computing...

work page 2010
[23]

Optimus: an efficient dynamic resource scheduler for deep learning clusters

Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. Optimus: an efficient dynamic resource scheduler for deep learning clusters. InProceedings of the Thirteenth EuroSys Conference, EuroSys ’18, New York, NY , USA, 2018. Association for Computing Machinery

work page 2018
[24]

Ganger, and Eric P

Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pages 1–18. USENIX Association, July 2021

work page 2021
[25]

Parm: Adaptive resource allocation for datacenter power capping

Haoran Qiu, Linghao Zhang, Chen Wang2 Hubertus Franke, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. Parm: Adaptive resource allocation for datacenter power capping. In Machine Learning for Systems Workshop at the Annual Conference on Neural Information Processing Systems (NeurIPS 2023), 2023. 14

work page 2023
[26]

Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne

Ana Radovanovic, Ross Koningstein, Ian Schneider, Bokan Chen, Alexandre Nobrega Duarte, Binz Roy, Diyue Xiao, Maya Haridasan, Patrick Hung, Nick Care, Saurav Talukdar, E. Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne. Carbon-aware computing for datacen- ters.IEEE Transactions on Power Systems, 38:1270–1280, 2021

work page 2021
[27]

Data center power oversubscription with a medium voltage power plane and priority-aware capping

Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan. Data center power oversubscription with a medium voltage power plane and priority-aware capping. InProceedings of the 25th International Conference on Architect...

work page 2020
[28]

Tapas: Thermal-and power-aware scheduling for llm inference in cloud platforms

Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. Tapas: Thermal-and power-aware scheduling for llm inference in cloud platforms. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume 2, pages 126...

work page 2025
[29]

Xiaorui Wang, Ming Chen, Charles Lefurgy, and Tom W. Keller. Ship: Scalable hierarchical power control for large-scale data centers. In2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 91–100, 2009

work page 2009
[30]

Transparent GPU sharing in container clouds for deep learning workloads

Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, and Xin Jin. Transparent GPU sharing in container clouds for deep learning workloads. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 69–85, Boston, MA, April 2023. USENIX Association

work page 2023
[31]

Dynamo: Facebook’s data center-wide power management system

Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. Dynamo: Facebook’s data center-wide power management system. In2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 469–480, 2016

work page 2016
[32]

Gandiva: Introspective cluster scheduling for deep learning

Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 595–610, Carlsbad, CA, October ...

work page 2018
[33]

Shock- wave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning

Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, and Aditya Akella. Shock- wave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 703–723, Boston, MA, April 2023. USENIX Association. 15 A Greedy Proportional Allocation vs....

work page 2023

[1] [1]

Carbon explorer: A holistic framework for designing carbon aware datacenters

Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Udit Gupta, Manoj Chakkar- avarthy, David Brooks, and Carole-Jean Wu. Carbon explorer: A holistic framework for designing carbon aware datacenters. InProceedings of the 28th ACM International Confer- ence on Architectural Support for Programming Languages and Operating Systems, V olume 2, ASPLOS 20...

work page 2023

[2] [2]

Data center scale prediction-based power reservation steering, 2024

Nir Arad, Hadar Sivan, Gil Levy, Sridutt Bhalachandra, Larry Dennison, and Shie Mannor. Data center scale prediction-based power reservation steering, 2024. U.S. Patent Application No. 134580-1109 (NVD-109US), pending

work page 2024

[3] [3]

The datacenter as a computer: An introduction to the design of warehouse-scale machines.Synthesis lectures on computer architecture, 8(3):1–154, 2013

Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines.Synthesis lectures on computer architecture, 8(3):1–154, 2013

work page 2013

[4] [4]

Apollo: Scalable and coordinated scheduling for Cloud-Scale computing

Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for Cloud-Scale computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285–300, Broomfield, CO, October 2014. USENIX Association

work page 2014

[5] [5]

Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. Firma- ment: Fast, centralized cluster scheduling at scale. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 99–115, Savannah, GA, November 2016. USENIX Association

work page 2016

[6] [6]

Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo

Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. Tiresias: A GPU cluster manager for distributed deep learning. In16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 485–500, Boston, MA, February 2019. USENIX Association

work page 2019

[7] [7]

Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning

Brian Guenter, Navendu Jain, and Charles Williams. Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning. In2011 Proceedings IEEE INFOCOM, pages 1332–1340, 2011

work page 2011

[8] [8]

Gurobi optimizer, 2026

Gurobi Optimization, LLC. Gurobi optimizer, 2026

work page 2026

[9] [9]

Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads

Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 947–960, Renton, W A, July 2019. USENIX Association

work page 2019

[10] [10]

Reinforcement learning for data center energy efficiency optimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025

Hussain Kahil, Shiva Sharma, Petri Välisuo, and Mohammed Elmusrati. Reinforcement learning for data center energy efficiency optimization: A systematic literature review and research roadmap.Applied Energy, 389:125734, 2025

work page 2025

[11] [11]

Tullsen, and Tajana Simunic Rosing

Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Sampson, Houman Homayoun, Eddie Pettis, Dean M. Tullsen, and Tajana Simunic Rosing. Managing distributed ups energy for effective power capping in data centers. In2012 39th Annual International Symposium on Computer Architecture (ISCA), pages 488–499, 2012. 13

work page 2012

[12] [12]

Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini

Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Frujeri, Nithish Mahalingam, Pulkit A. Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini. Prediction-Based power oversubscription in cloud platforms. In2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 473–487. USENIX Association, July 2021

work page 2021

[13] [13]

Thunderbolt: Throughput-Optimized, Quality-of-Service- Aware power capping at scale

Shaohong Li, Xi Wang, Xiao Zhang, Vasileios Kontorinis, Sreekumar Kodakara, David Lo, and Parthasarathy Ranganathan. Thunderbolt: Throughput-Optimized, Quality-of-Service- Aware power capping at scale. In14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 1241–1255. USENIX Association, November 2020

work page 2020

[14] [14]

Themis: Fair and efficient GPU cluster scheduling

Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. Themis: Fair and efficient GPU cluster scheduling. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 289–304, Santa Clara, CA, February 2020. USENIX Association

work page 2020

[15] [15]

Scalable data center power management via a global stress signal

Daniel Miller, Neal Master, Zhengyuan Zhou, and Nicholas Bambos. Scalable data center power management via a global stress signal. In2015 IEEE Global Communications Conference (GLOBECOM), pages 1–7, 2015

work page 2015

[16] [16]

Kinetic power capping using fuzzy logic-based dynamic system prioritization

Rishi Mukherjee, Shivendra Katiyar, Lori Lynn Matthews, and Elie Antoun Jreij. Kinetic power capping using fuzzy logic-based dynamic system prioritization. U.S. Patent US20240126360A1,

work page

[17] [17]

Heterogeneity-Aware cluster scheduling policies for deep learning workloads

Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. Heterogeneity-Aware cluster scheduling policies for deep learning workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 481–498. USENIX Association, November 2020

work page 2020

[18] [18]

Datacenter energy optimized power profiles, 2025

Sreedhar Narayanaswamy, Pratikkumar Dilipkumar Patel, Ian Karlin, Apoorv Gupta, Sudhir Saripalli, and Janey Guo. Datacenter energy optimized power profiles, 2025

work page 2025

[19] [19]

Nvidia domain power service (dps), 2026

NVIDIA Corporation. Nvidia domain power service (dps), 2026

work page 2026

[20] [20]

Polca: Power oversubscription in llm cloud providers.arXiv preprint arXiv:2308.12908, 2023

Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Polca: Power oversubscription in llm cloud providers.arXiv preprint arXiv:2308.12908, 2023

work page arXiv 2023

[21] [21]

Characterizing power management opportunities for llms in the cloud

Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, and Ricardo Bianchini. Characterizing power management opportunities for llms in the cloud. Proceedings of the 29th ACM International Conference on Architectural Support for Program- ming Languages and Operating Systems, V olume 3, 2024

work page 2024

[22] [22]

Wenisch, and Jack Underwood

Steven Pelley, David Meisner, Pooya Zandevakili, Thomas F. Wenisch, and Jack Underwood. Power routing: dynamic power provisioning in the data center. InProceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV , page 231–242, New York, NY , USA, 2010. Association for Computing...

work page 2010

[23] [23]

Optimus: an efficient dynamic resource scheduler for deep learning clusters

Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. Optimus: an efficient dynamic resource scheduler for deep learning clusters. InProceedings of the Thirteenth EuroSys Conference, EuroSys ’18, New York, NY , USA, 2018. Association for Computing Machinery

work page 2018

[24] [24]

Ganger, and Eric P

Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pages 1–18. USENIX Association, July 2021

work page 2021

[25] [25]

Parm: Adaptive resource allocation for datacenter power capping

Haoran Qiu, Linghao Zhang, Chen Wang2 Hubertus Franke, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. Parm: Adaptive resource allocation for datacenter power capping. In Machine Learning for Systems Workshop at the Annual Conference on Neural Information Processing Systems (NeurIPS 2023), 2023. 14

work page 2023

[26] [26]

Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne

Ana Radovanovic, Ross Koningstein, Ian Schneider, Bokan Chen, Alexandre Nobrega Duarte, Binz Roy, Diyue Xiao, Maya Haridasan, Patrick Hung, Nick Care, Saurav Talukdar, E. Mullen, Kendal Smith, MariEllen Cottman, and Walfredo Cirne. Carbon-aware computing for datacen- ters.IEEE Transactions on Power Systems, 38:1270–1280, 2021

work page 2021

[27] [27]

Data center power oversubscription with a medium voltage power plane and priority-aware capping

Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan. Data center power oversubscription with a medium voltage power plane and priority-aware capping. InProceedings of the 25th International Conference on Architect...

work page 2020

[28] [28]

Tapas: Thermal-and power-aware scheduling for llm inference in cloud platforms

Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, and Ricardo Bianchini. Tapas: Thermal-and power-aware scheduling for llm inference in cloud platforms. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume 2, pages 126...

work page 2025

[29] [29]

Xiaorui Wang, Ming Chen, Charles Lefurgy, and Tom W. Keller. Ship: Scalable hierarchical power control for large-scale data centers. In2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 91–100, 2009

work page 2009

[30] [30]

Transparent GPU sharing in container clouds for deep learning workloads

Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, and Xin Jin. Transparent GPU sharing in container clouds for deep learning workloads. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 69–85, Boston, MA, April 2023. USENIX Association

work page 2023

[31] [31]

Dynamo: Facebook’s data center-wide power management system

Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. Dynamo: Facebook’s data center-wide power management system. In2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 469–480, 2016

work page 2016

[32] [32]

Gandiva: Introspective cluster scheduling for deep learning

Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 595–610, Carlsbad, CA, October ...

work page 2018

[33] [33]

Shock- wave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning

Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, and Aditya Akella. Shock- wave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 703–723, Boston, MA, April 2023. USENIX Association. 15 A Greedy Proportional Allocation vs....

work page 2023