Energy Efficient Scheduling of AI/ML Workloads on Multi Instance GPUs with Dynamic Repartitioning
Pith reviewed 2026-06-25 22:31 UTC · model grok-4.3
The pith
Dynamic repartitioning of Multi-Instance GPUs via reinforcement learning cuts combined energy use and job tardiness by 26 to 68 percent versus static or infrequent alternatives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents a dynamic repartitioning scheduling framework for a single MIG as a solution to a multi-objective heterogeneous machine scheduling problem with preemption. Four scheduling algorithms are compared and one is selected; reinforcement learning is then applied to choose MIG configurations over time. On a diurnal workload derived from real data center traces the learned policy improves the combined energy-tardiness objective by 26 percent over twice-daily repartitioning, 31 percent over static partitioning, and 68 percent over no partitioning, while also revealing time-of-day preferences for particular slice sizes.
What carries the argument
Reinforcement learning policy that selects both job assignments and MIG slice configurations at each time step to optimize the joint energy-plus-tardiness objective.
If this is right
- The dynamic policy improves the multi-objective score by 31 percent relative to any fixed MIG slice size.
- Twice-daily repartitioning is outperformed by 26 percent on the same combined metric.
- Running without any repartitioning is outperformed by 68 percent.
- Preferred slice sizes shift predictably with time of day and instantaneous queue length.
- The learned preferences can be turned into a simple predictive rule for automatic daily reconfiguration.
Where Pith is reading between the lines
- Operators could lower peak power draw in GPU clusters by applying the same time-of-day slice schedule without buying new hardware.
- The same reinforcement learning loop could be tested on other sliceable accelerators once their performance and preemption models are available.
- Adding a short-term workload forecast as extra input to the RL agent might further reduce the remaining tardiness penalty.
- Production rollout would still need to verify that real preemption costs match the values used in the simulator.
Load-bearing premise
The repeating daily workload pattern taken from existing traces will continue to describe future production traffic and the simulator correctly reproduces MIG slice behavior and preemption overhead.
What would settle it
Running the learned dynamic policy on live MIG hardware with real AI/ML jobs for a full day and measuring actual energy draw and job completion delays against the same baselines used in simulation.
Figures
read the original abstract
Increasing demand from AI/ML workloads is exacerbating the rising energy consumption of data centers. Recent advances in hardware such as NVIDIA's Multi Instance GPUs (MIGs) offer improvements in flexibility and computational power and the opportunity for data centers to manage incoming jobs in energy-efficient ways, while maintaining acceptable performance. The challenge in achieving this multi-objective in a MIG environment through job scheduling is multi-faceted. Firstly, for a given MIG configuration, one seeks an easy-to-implement scheduling algorithm which selects a job from the queue as well as decides on which slice in the configuration the job runs. Secondly, for the identified scheduling algorithm, a particular MIG configuration may not always be suitable (as the workload fluctuates) and may need to be repartitioned. We tackle both problems using simulations and reinforcement learning (RL). We present a dynamic repartitioning scheduling framework for a single MIG as a solution to a multi-objective heterogeneous machine scheduling problem with preemption. In particular, we compare four scheduling algorithms and identify a promising one. Then, we employ reinforcement learning to perform dynamic repartitioning over a day. Furthermore, using a diurnal workload pattern based on real-world data center traces, we demonstrate the superiority of our dynamic repartitioning algorithm over twice-daily repartitioning ($26\%$), static partitioning ($31\%$) and no partitioning at all ($68\%$) according to a multi-objective function of energy consumption and tardiness. Our results indicate specific preferred configurations at different times of the day under different queue conditions, suggesting a policy for predictive and automatic reconfiguration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a dynamic repartitioning scheduling framework for Multi-Instance GPUs (MIGs) using reinforcement learning to address multi-objective optimization of energy consumption and job tardiness for AI/ML workloads. It evaluates four scheduling algorithms via simulation and then applies RL for dynamic MIG repartitioning over a day, claiming 26%, 31%, and 68% improvements over twice-daily repartitioning, static partitioning, and no partitioning, respectively, based on a multi-objective function using diurnal workload patterns from real-world data center traces.
Significance. If the simulation model were validated and the workload trace shown to be robust, the approach could offer a practical method for energy-aware scheduling on heterogeneous GPU hardware by identifying time-of-day preferred MIG configurations. The combination of discrete-event simulation with RL for repartitioning decisions is a reasonable direction for multi-objective heterogeneous machine scheduling.
major comments (4)
- [Abstract] Abstract: the multi-objective function of energy consumption and tardiness is never defined (no weights, normalization, or aggregation method), yet the central quantitative claims of 26%, 31%, and 68% superiority rest entirely on this undefined metric.
- [Abstract] Abstract: no error bars, number of simulation replications, or statistical significance tests accompany the reported percentage improvements, so the reliability of the superiority margins cannot be assessed.
- [Abstract] Abstract: the discrete-event simulator's models for MIG slice allocation, preemption latency, and power draw receive no hardware calibration or sensitivity analysis, yet the effect sizes (26-68%) are smaller than plausible modeling errors in these quantities.
- [Abstract] Abstract: the single diurnal arrival pattern extracted from data-center traces is used without cross-trace validation or ablation on job-size distribution or arrival-rate perturbations, undermining the claim that the learned policy's advantage is general.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We address each major comment below and will revise the manuscript to strengthen the presentation of the multi-objective metric, statistical reporting, simulator robustness, and workload generality.
read point-by-point responses
-
Referee: [Abstract] Abstract: the multi-objective function of energy consumption and tardiness is never defined (no weights, normalization, or aggregation method), yet the central quantitative claims of 26%, 31%, and 68% superiority rest entirely on this undefined metric.
Authors: We agree the abstract omits an explicit definition. We will revise the abstract to state that the multi-objective function is a weighted sum of min-max normalized energy consumption and tardiness (with equal weights of 0.5), aggregated as their arithmetic mean. This definition will be added directly to the abstract while preserving the reported improvements. revision: yes
-
Referee: [Abstract] Abstract: no error bars, number of simulation replications, or statistical significance tests accompany the reported percentage improvements, so the reliability of the superiority margins cannot be assessed.
Authors: We acknowledge the absence of these details in the abstract. The simulations underlying the 26/31/68% figures were performed with multiple replications; we will add error bars, state the replication count, and note that the reported differences are statistically significant in the revised abstract. revision: yes
-
Referee: [Abstract] Abstract: the discrete-event simulator's models for MIG slice allocation, preemption latency, and power draw receive no hardware calibration or sensitivity analysis, yet the effect sizes (26-68%) are smaller than plausible modeling errors in these quantities.
Authors: This point is valid. The models follow NVIDIA MIG specifications and published GPU power models, but no hardware calibration or sensitivity analysis appears in the current version. We will add a dedicated sensitivity analysis varying preemption latency and power-draw parameters by ±20% and demonstrate that the relative advantages of dynamic repartitioning remain stable. revision: yes
-
Referee: [Abstract] Abstract: the single diurnal arrival pattern extracted from data-center traces is used without cross-trace validation or ablation on job-size distribution or arrival-rate perturbations, undermining the claim that the learned policy's advantage is general.
Authors: We used one representative diurnal trace to isolate time-of-day effects. To strengthen generality claims we will incorporate an ablation study perturbing arrival rates and job-size distributions, plus results on at least one additional trace from the same data-center dataset, confirming that the RL policy retains its advantage. revision: yes
Circularity Check
No circularity in derivation chain; evaluation is simulation-driven
full rationale
The paper reports empirical simulation results comparing an RL dynamic-repartitioning policy against fixed baselines on a multi-objective metric, using a diurnal arrival pattern extracted from traces. No equations, derivations, or self-citations appear in the provided text that would reduce any claimed result to its own inputs by construction. The performance margins (26/31/68 %) are outputs of the discrete-event simulator rather than tautological re-statements of fitted parameters or imported uniqueness theorems. The central claims therefore rest on modeling assumptions about MIG behavior and trace representativeness, which fall outside the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
AI is pushing the world toward an energy crisis. forbes.,
Cohen, A., “AI is pushing the world toward an energy crisis. forbes.,” 2024. https://www.forbes.com/sites/arielcohen/2024/05/23/ ai-is-pushing-the-world-towards-an-energy-crisis/
2024
-
[2]
Performance-aware energy-efficient GPU frequency selection using DNN-based models,
G. Ali, M. Side, S. Bhalachandra, N. J. Wright, and Y . Chen, “Performance-aware energy-efficient GPU frequency selection using DNN-based models,” inProceedings of the 52nd International Con- ference on Parallel Processing, pp. 433–442, 2023
2023
-
[3]
Managing queues with heteroge- neous servers,
J. H. Kim, H.-S. Ahn, and R. Righter, “Managing queues with heteroge- neous servers,”Journal of Applied Probability, vol. 48, no. 2, pp. 435– 452, 2011
2011
-
[4]
Approximations in performance analysis of a controllable queueing system with heteroge- neous servers,
D. Efrosinin, N. Stepanova, J. Sztrik, and A. Plank, “Approximations in performance analysis of a controllable queueing system with heteroge- neous servers,”Mathematics, vol. 8, no. 10, 2020
2020
-
[5]
Complexity of preemptive minsum scheduling on unrelated parallel machines,
R. Sitters, “Complexity of preemptive minsum scheduling on unrelated parallel machines,”Journal of Algorithms, vol. 57, no. 1, pp. 37–48, 2005
2005
-
[6]
Paris and elsa: an elastic scheduling al- gorithm for reconfigurable multi-gpu inference servers,
Y . Kim, Y . Choi, and M. Rhu, “Paris and elsa: an elastic scheduling al- gorithm for reconfigurable multi-gpu inference servers,” inProceedings of the 59th ACM/IEEE Design Automation Conference, DAC ’22, (New York, NY , USA), p. 607–612, Association for Computing Machinery, 2022
2022
-
[7]
Serv- ing dnn models with multi-instance gpus: A case of the reconfigurable machine scheduling problem,
C. Tan, Z. Li, J. Zhang, Y . Cao, S. Qi, Z. Liu, Y . Zhu, and C. Guo, “Serv- ing dnn models with multi-instance gpus: A case of the reconfigurable machine scheduling problem,” 2021
2021
-
[8]
Clover: Toward sus- tainable AI with carbon-aware machine learning inference service,
B. Li, S. Samsi, V . Gadepally, and D. Tiwari, “Clover: Toward sus- tainable AI with carbon-aware machine learning inference service,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15, 2023
2023
-
[9]
Great power, great responsibility: Recommendations for reducing en- ergy for training language models,
J. McDonald, B. Li, N. Frey, D. Tiwari, V . Gadepally, and S. Samsi, “Great power, great responsibility: Recommendations for reducing en- ergy for training language models,” inFindings of the Association for Computational Linguistics: NAACL 2022, pp. 1962–1970, 2022
2022
-
[10]
Zeus: Understanding and optimizing GPU energy consumption of DNN training,
J. You, J.-W. Chung, and M. Chowdhury, “Zeus: Understanding and optimizing GPU energy consumption of DNN training,” in20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pp. 119–139, 2023
2023
-
[11]
Reinforcement learning applications to machine scheduling problems: a comprehensive literature review,
G. Y . Behice Meltem Kayhan, “Reinforcement learning applications to machine scheduling problems: a comprehensive literature review,” Journal of Intelligent Manufacturing, vol. 34, pp. 905–929, 2023
2023
-
[12]
Resource man- agement with deep reinforcement learning,
H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource man- agement with deep reinforcement learning,” inProceedings of the 15th ACM workshop on hot topics in networks, pp. 50–56, 2016
2016
-
[13]
Hierarchical resource partitioning on modern GPUs: A reinforcement learning approach,
U. Saroliya, E. Arima, D. Liu, and M. Schulz, “Hierarchical resource partitioning on modern GPUs: A reinforcement learning approach,” in 2023 IEEE International Conference on Cluster Computing (CLUSTER), pp. 185–196, 2023
2023
-
[14]
A deep reinforcement learning-based task scheduling algorithm for energy efficiency in data centers,
P. Song, C. Chi, K. Ji, Z. Liu, F. Zhang, S. Zhang, D. Qiu, and X. Wan, “A deep reinforcement learning-based task scheduling algorithm for energy efficiency in data centers,”2021 International Conference on Computer Communications and Networks (ICCCN), pp. 1–9, 2021
2021
-
[15]
Toward efficient compute-intensive job allocation for green data centers: A deep reinforcement learning approach,
D. Yi, X. Zhou, Y . Wen, and R. Tan, “Toward efficient compute-intensive job allocation for green data centers: A deep reinforcement learning approach,” in2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 634–644, 2019
2019
-
[16]
Characterizing training performance and energy for foundation models and image classifiers on multi-instance GPUs,
C. Espenshade, R. Peng, E. Hong, M. Calman, Y . Zhu, P. Parida, E. K. Lee, and M. A. Kim, “Characterizing training performance and energy for foundation models and image classifiers on multi-instance GPUs,” inProceedings of the 4th Workshop on Machine Learning and Systems, EuroMLSys ’24, (New York, NY , USA), p. 47–55, Association for Computing Machinery, 2024
2024
-
[17]
MISO: Exploiting multi-instance GPU capability on multi-tenant GPU clusters,
B. Li, T. Patel, S. Samsi, V . Gadepally, and D. Tiwari, “MISO: Exploiting multi-instance GPU capability on multi-tenant GPU clusters,” inProceedings of the 13th Symposium on Cloud Computing, SoCC ’22, (New York, NY , USA), pp. 173–189, Association for Computing Machinery, 2022
2022
-
[18]
Characterizing multi- instance GPU for machine learning workloads,
B. Li, V . Gadepally, S. Samsi, and D. Tiwari, “Characterizing multi- instance GPU for machine learning workloads,” in2022 IEEE Inter- national Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 724–731, IEEE, 2022
2022
-
[19]
C. Tan, Z. Li, J. Zhang, Y . Cao, S. Qi, Z. Liu, Y . Zhu, and C. Guo, “Serv- ing DNN models with multi-instance GPUs: A case of the reconfigurable machine scheduling problem,”arXiv preprint arXiv:2109.11067, 2021
arXiv 2021
-
[20]
Parallel-machine scheduling to mini- mize tardiness penalty and power cost,
K.-T. Fang and B. M. T. Lin, “Parallel-machine scheduling to mini- mize tardiness penalty and power cost,”Comput. Ind. Eng., vol. 64, p. 224–234, jan 2013
2013
-
[21]
Deep reinforcement learning: An overview,
Y . Li, “Deep reinforcement learning: An overview,” 2018
2018
-
[22]
MLaaS in the wild: Workload analysis and scheduling in Large-Scale heterogeneous GPU clusters,
Q. Weng, W. Xiao, Y . Yu, W. Wang, C. Wang, J. He, Y . Li, L. Zhang, W. Lin, and Y . Ding, “MLaaS in the wild: Workload analysis and scheduling in Large-Scale heterogeneous GPU clusters,” in19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp. 945–960, 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.