Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers
Pith reviewed 2026-05-10 15:38 UTC · model grok-4.3
The pith
Mixing batch and inference workloads in AI data centers decouples power variability from short-horizon ramping
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In shared-GPU systems the composition of batch and inference workloads decouples aggregate power variability from short-horizon ramping. As the inference share rises, variability becomes U-shaped whereas ramping becomes hump-shaped, particularly under higher loading. The underlying mechanism is asymmetric: at intermediate workload mixes queued batch jobs fill capacity left idle by fluctuating inference demand, reducing aggregate power variability; short-horizon ramping remains elevated because inference-side fluctuations propagate more directly into realized power.
What carries the argument
Asymmetric buffering by queued batch jobs of fluctuations in inference demand within shared-GPU systems
Load-bearing premise
The trace-calibrated model of arrivals, queues, scheduling and GPU power accurately reproduces the asymmetric effect of inference fluctuations on total power without missing hardware-state or contention effects.
What would settle it
Power traces collected from a real shared-GPU cluster while the inference-to-batch ratio is deliberately varied would show whether variability follows a U-shape and ramping follows a hump-shape.
read the original abstract
Artificial intelligence (AI) is driving rapid growth in electricity demand, yet the grid-facing power dynamics of AI data centers remain poorly understood. Here we show that, in shared-GPU systems, the composition of batch and inference workloads decouples aggregate power variability from short-horizon ramping. As the inference share rises, variability becomes U-shaped, whereas ramping becomes hump-shaped, particularly under higher loading. The magnitude and turning points of these patterns also depend on system loading. Using a trace-calibrated framework linking workload arrivals, queueing, scheduling, and GPU power, we show that the underlying mechanism is asymmetric. At intermediate workload mixes, queued batch jobs fill capacity left idle by fluctuating inference demand, reducing aggregate power variability. However, short-horizon ramping remains elevated because inference-side fluctuations propagate more directly into realized power. AI data centers should therefore be understood as dynamic systems whose workload composition shapes their grid impact.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that in shared-GPU AI data centers the mix of batch and inference workloads decouples aggregate power variability from short-horizon ramping: as the inference fraction rises, variability follows a U-shape while ramping follows a hump-shape (especially at high load). The mechanism is asymmetric—queued batch jobs fill inference-induced idle slots to smooth variability, yet inference fluctuations propagate more directly into realized power ramps. All results are obtained from a trace-calibrated discrete-event simulation that links arrivals, queueing, scheduling, and GPU power mapping.
Significance. If the simulation framework is shown to reproduce measured power traces, the decoupling result would be a useful contribution to the still-sparse literature on AI data-center grid dynamics. It supplies a concrete, workload-composition-based explanation for why variability and ramping need not move together, which could inform both data-center scheduling policies and grid-operator forecasting.
major comments (2)
- [Methods / Simulation Framework] The central claim rests entirely on the trace-calibrated framework (described in the methods section). No quantitative comparison to measured power traces from operating AI clusters is presented, nor are error bars or sensitivity checks to calibration parameters or unmodeled effects (DVFS transitions, network contention) reported. This leaves the reported U- and hump-shaped patterns without external grounding.
- [Results] Results section: the qualitative shapes are shown for varying inference shares and load levels, but the manuscript supplies neither statistical significance tests on the turning points nor robustness checks when the underlying arrival traces or scheduling policy parameters are perturbed. These omissions are load-bearing because the decoupling conclusion is presented as a general system property rather than a model-specific observation.
minor comments (2)
- [Abstract] Abstract: the statement that “the magnitude and turning points … also depend on system loading” is not accompanied by any numerical values or figure references, making it difficult for a reader to assess the practical size of the effect.
- [Methods] Notation: the precise time horizon used to define “short-horizon ramps” (e.g., 1 min, 5 min) should be stated explicitly when the metric is first introduced.
Simulated Author's Rebuttal
We thank the referee for the insightful comments, which have helped us improve the clarity and robustness of our work. We address each major comment below and describe the revisions made to the manuscript.
read point-by-point responses
-
Referee: [Methods / Simulation Framework] The central claim rests entirely on the trace-calibrated framework (described in the methods section). No quantitative comparison to measured power traces from operating AI clusters is presented, nor are error bars or sensitivity checks to calibration parameters or unmodeled effects (DVFS transitions, network contention) reported. This leaves the reported U- and hump-shaped patterns without external grounding.
Authors: We concur that additional validation would enhance the credibility of the simulation results. Our framework is calibrated on workload arrival traces from public sources and employs GPU power models derived from hardware specifications and utilization. In the revised manuscript, we have expanded the Methods section to include detailed calibration procedures, error bars on key metrics derived from multiple simulation runs, and sensitivity analyses varying calibration parameters. We have also added a discussion of potential unmodeled effects such as DVFS transitions and network contention, noting their likely minor impact under the modeled conditions. However, we lack access to real-time power measurement data from commercial AI clusters, which prevents a direct quantitative match; this is acknowledged as a limitation and suggested for future research. revision: partial
-
Referee: [Results] Results section: the qualitative shapes are shown for varying inference shares and load levels, but the manuscript supplies neither statistical significance tests on the turning points nor robustness checks when the underlying arrival traces or scheduling policy parameters are perturbed. These omissions are load-bearing because the decoupling conclusion is presented as a general system property rather than a model-specific observation.
Authors: We appreciate this point and have revised the Results section to include statistical significance assessments. Specifically, we now report p-values from trend tests on the variability and ramping curves to confirm the U- and hump-shapes, along with confidence intervals obtained via bootstrapping over simulation replicates. Furthermore, we conducted robustness experiments by resampling the arrival traces and altering scheduling parameters (e.g., batch queue weights), demonstrating that the decoupling patterns remain consistent. These new analyses are incorporated into the main text and supplementary material, supporting the generality of the observed behavior beyond specific parameter choices. revision: yes
- Direct quantitative comparison to measured power traces from operating AI clusters due to unavailability of such proprietary data.
Circularity Check
No significant circularity; results emerge from simulation
full rationale
The paper derives U-shaped variability and hump-shaped ramping as emergent outputs of a trace-calibrated framework that maps workload arrivals through queueing, scheduling, and GPU power models. No equations define the target shapes in terms of themselves, no fitted parameters are renamed as predictions, and no self-citation chain is invoked to force the asymmetric propagation result. The central claim rests on the model's mechanics applied to input traces rather than on any definitional equivalence or load-bearing self-reference, leaving the derivation self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Lawrence Berkeley National Laboratory, Berkeley, California
Shehabi, A., Smith, S.J., Masanet, E., Koomey, J., Horner, N., Shah, A., Lanzisera, S.: 2024 United States Data Center Energy Usage Report. Lawrence Berkeley National Laboratory, Berkeley, California. LBNL- 2001637. Accessed June 19, 2025 (2024). https://eta.lbl.gov/publications/ 2024-lbnl-data-center-energy-usage-report
work page 2024
-
[2]
International Energy Agency: Energy Demand from AI. Accessed July 10, 2025 (2024). https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai 16
work page 2025
-
[3]
International Journal of Forecasting30(4), 1030–1081 (2014)
Weron, R.: Electricity price forecasting: A review of the state-of-the-art with a look into the future. International Journal of Forecasting30(4), 1030–1081 (2014)
work page 2014
-
[4]
Energy Systems Integration Group, Reston, VA
Redefining Resource Adequacy Task Force: Redefining Resource Adequacy for Modern Power Systems. Energy Systems Integration Group, Reston, VA. Accessed June 19, 2025 (2021). https://www.esig.energy/wp-content/uploads/ 2021/08/ESIG-Redefining-Resource-Adequacy-2021.pdf
work page 2025
-
[5]
North American Electric Reliability Corporation, Atlanta, GA
Integration of Variable Generation Task Force: Flexibility Requirements and Met- rics for Variable Generation: Implications for System Planning Studies. North American Electric Reliability Corporation, Atlanta, GA. Accessed June 19, 2025 (2010). https://www.nerc.com/files/ivgtf1-4 final.pdf
work page 2025
-
[6]
Gattaciecca, J., Trumbull, K., Krumholz, S., McKanna, K., DeShazo, J.R.: Identifying Effective Demand Response Program Designs for Residential Cus- tomers. California Energy Commission. Publication Number: CEC-500-2020-072. Accessed June 19, 2025 (2020). https://www.energy.ca.gov/sites/default/files/ 2021-05/CEC-500-2020-072.pdf
work page 2020
-
[7]
Convened by the World Resources Institute and the World Business Council for Sustainable Development
Greenhouse Gas Protocol: Scope 2 Guidance: An amendment to the GHG Pro- tocol Corporate Standard. Convened by the World Resources Institute and the World Business Council for Sustainable Development. Accessed July 10, 2025 (2023). https://ghgprotocol.org/sites/default/files/2023-03/Scope%202% 20Guidance.pdf
work page 2025
-
[8]
electric grid: A water- shed moment
Mural, R., Pherwani, D., Gupta, C., Yu, Y., Takahashi, A., Kim, D., Majumder, S., Lee, H., Yu, M., Xie, L.: AI, data centers, and the U.S. electric grid: A water- shed moment. Technical report, Belfer Center for Science and International Affairs (February 2026)
work page 2026
-
[9]
Data Center Power Outlook: Balancing Competing Power Consump- tion Needs
Lee, V.: U.S. Data Center Power Outlook: Balancing Competing Power Consump- tion Needs. Accessed June 19, 2025 (2024). https://www.linkedin.com/pulse/ us-data-center-power-outlook-balancing-competing-consumption-lee-iz4pe/
work page 2025
-
[10]
Science367(6481), 984–986 (2020)
Masanet, E., Shehabi, A., Lei, N., Smith, S., Koomey, J.: Recalibrating global data center energy-use estimates. Science367(6481), 984–986 (2020)
work page 2020
-
[11]
IEEE Transactions on Smart Grid12(4), 3056–3069 (2021)
Chen, M., Gao, C., Shahidehpour, M., Li, Z.: Incentive-compatible demand response for spatially coupled internet data centers in electricity markets. IEEE Transactions on Smart Grid12(4), 3056–3069 (2021)
work page 2021
-
[12]
Advances in Applied Energy17, 100202 (2025)
Riepin, I., Brown, T., Zavala, V.M.: Spatio-temporal load shifting for truly clean computing. Advances in Applied Energy17, 100202 (2025)
work page 2025
-
[13]
Acun, B., Lee, B., Kazhamiaka, F., Maeng, K., Gupta, U., Chakkaravarthy, M., Brooks, D., Wu, C.-J.: Carbon explorer: A holistic framework for designing carbon aware datacenters. In: Proceedings of the 28th ACM International Conference 17 on Architectural Support for Programming Languages and Operating Systems, Volume 2, pp. 118–132 (2023)
work page 2023
-
[14]
In: Proceedings of the ACM SIGCOMM 2024 Conference, pp
Qian, K., Xi, Y., Cao, J., Gao, J., Xu, Y., Guan, Y., Fu, B., Shi, X., Zhu, F., Miao, R.,et al.: Alibaba HPN: A data center network for large language model training. In: Proceedings of the ACM SIGCOMM 2024 Conference, pp. 691–706 (2024)
work page 2024
-
[15]
In: 2025 USENIX Annual Technical Conference (USENIX ATC 25), pp
Wang, J., Wang, Y., Han, M., Chen, R.: Colocating ML inference and training with fast GPU memory handover. In: 2025 USENIX Annual Technical Conference (USENIX ATC 25), pp. 1657–1675 (2025)
work page 2025
-
[16]
In: 2024 IEEE 44th Inter- national Conference on Distributed Computing Systems (ICDCS), pp
Chen, G., Subramaniyan, S., Wang, X.: Latency-guaranteed co-location of infer- ence and training for reducing data center expenses. In: 2024 IEEE 44th Inter- national Conference on Distributed Computing Systems (ICDCS), pp. 473–484 (2024)
work page 2024
-
[17]
Patel, P., Choukse, E., Zhang, C., Goiri, ´I., Warrier, B., Mahalingam, N., Bian- chini, R.: Characterizing power management opportunities for LLMs in the cloud. In: Proceedings of the 29th ACM International Conference on Architectural Sup- port for Programming Languages and Operating Systems, Volume 3. ASPLOS ’24, pp. 207–222. Association for Computing M...
work page 2024
-
[18]
Choukse, E., Warrier, B., Heath, S., Belmont, L., Zhao, A., Khan, H.A., Harry, B., Kappel, M., Hewett, R.J., Datta, K.,et al.: Power stabilization for AI training datacenters. arXiv preprint (2025) https://doi.org/10.48550/arXiv.2508.14318
-
[19]
arXiv:2403.20306 [cs.AI] https://arxiv.org/abs/2403.20306
Stojkovic, J., Choukse, E., Zhang, C., Goiri, ´I., Torrellas, J.: Towards greener LLMs: Bringing energy-efficiency to the forefront of LLM inference. arXiv preprint (2024) https://doi.org/10.48550/arXiv.2403.20306
-
[20]
The ml. energy benchmark: Toward automated inference energy measurement and optimization,
Chung, J.-W., Liu, J., Ma, J.J., Wu, R., Kweon, O.J., Xia, Y., Wu, Z., Chowd- hury, M.: The ML.ENERGY benchmark: Toward automated inference energy measurement and optimization. arXiv preprint (2025) https://doi.org/10.48550/ arXiv.2505.06371
-
[21]
In: Proceedings of the Tenth European Conference on Computer Systems, pp
Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Proceedings of the Tenth European Conference on Computer Systems, pp. 1–17 (2015)
work page 2015
-
[22]
In: Workshop on Job Scheduling Strategies for Parallel Processing, pp
Yoo, A.B., Jette, M.A., Grondona, M.: Slurm: Simple Linux utility for resource management. In: Workshop on Job Scheduling Strategies for Parallel Processing, pp. 44–60 (2003)
work page 2003
-
[23]
Communications of the ACM59(5), 50–57 (2016) 18
Burns, B., Grant, B., Oppenheimer, D., Brewer, E., Wilkes, J.: Borg, Omega, and Kubernetes. Communications of the ACM59(5), 50–57 (2016) 18
work page 2016
-
[24]
In: 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp
Eisenman, A., Matam, K.K., Ingram, S., Mudigere, D., Krishnamoorthi, R., Nair, K., Smelyanskiy, M., Annavaram, M.: Check-N-Run: A checkpointing system for training deep learning recommendation models. In: 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp. 929–943 (2022)
work page 2022
-
[25]
In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp
Stojkovic, J., Zhang, C., Goiri, ´I., Torrellas, J., Choukse, E.: DynamoLLM: Designing LLM inference clusters for performance and energy efficiency. In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 1348–1362 (2025)
work page 2025
-
[26]
In: Proceedings of the 29th Symposium on Operating Systems Principles, pp
Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with PagedAttention. In: Proceedings of the 29th Symposium on Operating Systems Principles, pp. 611–626 (2023)
work page 2023
-
[27]
In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp
Zhong, Y., Liu, S., Chen, J., Hu, J., Zhu, Y., Liu, X., Jin, X., Zhang, H.: Dist- Serve: Disaggregating prefill and decoding for goodput-optimized large language model serving. In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp. 193–210 (2024)
work page 2024
-
[28]
In: 2021 IEEE High Performance Extreme Computing Conference (HPEC), pp
Samsi, S., Weiss, M.L., Bestor, D., Li, B., Jones, M., Reuther, A., Edelman, D., Arcand, W., Byun, C., Holodnack, J.,et al.: The MIT Supercloud dataset. In: 2021 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–8 (2021)
work page 2021
-
[29]
IEEE Transactions on Industry Applications54(6), 5599– 5608 (2018)
Ding, Z., Xie, L., Lu, Y., Wang, P., Xia, S.: Emission-aware stochastic resource planning scheme for data center microgrid considering batch workload scheduling and risk management. IEEE Transactions on Industry Applications54(6), 5599– 5608 (2018)
work page 2018
-
[30]
IEEE Transactions on Smart Grid9(4), 3748–3762 (2016)
Yu, L., Jiang, T., Zou, Y.: Distributed real-time energy management in data center microgrids. IEEE Transactions on Smart Grid9(4), 3748–3762 (2016)
work page 2016
-
[31]
Journal of Parallel and Distributed Computing 63(11), 1105–1122 (2003)
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63(11), 1105–1122 (2003)
work page 2003
-
[32]
Joe H.: Hierarchical grouping to optimize an objective function
Ward, J. Joe H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association58(301), 236–244 (1963)
work page 1963
-
[33]
Cambridge University Press, Cambridge, UK (2011)
Hilbe, J.M.: Negative Binomial Regression, 2nd edn. Cambridge University Press, Cambridge, UK (2011)
work page 2011
-
[34]
Journal of the Royal Statistical Society: Series B (Methodological)44(2), 139–160 (1982)
Aitchison, J.: The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological)44(2), 139–160 (1982)
work page 1982
-
[35]
19 Chapman and Hall/CRC, New York (1995)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. 19 Chapman and Hall/CRC, New York (1995)
work page 1995
-
[36]
IEEE Transactions on Acoustics, Speech, and Signal Processing35(3), 400–401 (1987)
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing35(3), 400–401 (1987)
work page 1987
-
[37]
In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp
Sheng, Y., Cao, S., Li, D., Zhu, B., Li, Z., Zhuo, D., Gonzalez, J.E., Stoica, I.: Fairness in serving large language models. In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp. 965–988 (2024)
work page 2024
-
[38]
IEEE Transactions on Parallel and Distributed Systems18(6), 789–803 (2007)
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated pre- dictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems18(6), 789–803 (2007)
work page 2007
-
[39]
ACM Transactions on Computer Systems21(2), 207–233 (2003)
Harchol-Balter, M., Schroeder, B., Bansal, N., Agrawal, M.: Size-based scheduling to improve web performance. ACM Transactions on Computer Systems21(2), 207–233 (2003)
work page 2003
-
[40]
Theoretical Computer Science130(1), 17–47 (1994) 20
Motwani, R., Phillips, S., Torng, E.: Nonclairvoyant scheduling. Theoretical Computer Science130(1), 17–47 (1994) 20
work page 1994
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.