arxiv: 2604.10769 · v1 · submitted 2026-04-12 · 📡 eess.SY · cs.DC· cs.PF· cs.SY

Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers

Subir Majumder , Minlan Yu , Le Xie This is my paper

Pith reviewed 2026-05-10 15:38 UTC · model grok-4.3

classification 📡 eess.SY cs.DCcs.PFcs.SY

keywords AI data centerspower demandworkload compositionbatch jobsinference workloadspower variabilityshort-horizon rampinggrid impact

0 comments

The pith

Mixing batch and inference workloads in AI data centers decouples power variability from short-horizon ramping

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the ratio of batch to inference jobs in shared-GPU systems controls two separate aspects of power demand. Raising the inference share produces a U-shaped pattern in overall power variability and a hump-shaped pattern in short-term ramping needs, with both effects growing stronger at higher total loads. Batch jobs absorb spare capacity during inference dips and thereby smooth the combined power trace. Inference changes still reach the power draw quickly because they act directly on active GPUs. The result is that data centers can shape their grid impact through workload mix rather than through total compute volume alone.

Core claim

In shared-GPU systems the composition of batch and inference workloads decouples aggregate power variability from short-horizon ramping. As the inference share rises, variability becomes U-shaped whereas ramping becomes hump-shaped, particularly under higher loading. The underlying mechanism is asymmetric: at intermediate workload mixes queued batch jobs fill capacity left idle by fluctuating inference demand, reducing aggregate power variability; short-horizon ramping remains elevated because inference-side fluctuations propagate more directly into realized power.

What carries the argument

Asymmetric buffering by queued batch jobs of fluctuations in inference demand within shared-GPU systems

Load-bearing premise

The trace-calibrated model of arrivals, queues, scheduling and GPU power accurately reproduces the asymmetric effect of inference fluctuations on total power without missing hardware-state or contention effects.

What would settle it

Power traces collected from a real shared-GPU cluster while the inference-to-batch ratio is deliberately varied would show whether variability follows a U-shape and ramping follows a hump-shape.

read the original abstract

Artificial intelligence (AI) is driving rapid growth in electricity demand, yet the grid-facing power dynamics of AI data centers remain poorly understood. Here we show that, in shared-GPU systems, the composition of batch and inference workloads decouples aggregate power variability from short-horizon ramping. As the inference share rises, variability becomes U-shaped, whereas ramping becomes hump-shaped, particularly under higher loading. The magnitude and turning points of these patterns also depend on system loading. Using a trace-calibrated framework linking workload arrivals, queueing, scheduling, and GPU power, we show that the underlying mechanism is asymmetric. At intermediate workload mixes, queued batch jobs fill capacity left idle by fluctuating inference demand, reducing aggregate power variability. However, short-horizon ramping remains elevated because inference-side fluctuations propagate more directly into realized power. AI data centers should therefore be understood as dynamic systems whose workload composition shapes their grid impact.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Workload mixing in AI data centers can plausibly decouple power variability from short ramps via an asymmetric batch-filling mechanism, but the simulation outputs lack any real power trace checks.

read the letter

The central claim is that raising the inference share in shared-GPU clusters produces U-shaped aggregate power variability and hump-shaped short-horizon ramps, with the effect strengthening at higher loads. Batch jobs are said to fill inference-induced idle slots and thereby smooth variability, while inference fluctuations still drive the ramps directly. That asymmetry is the new piece relative to prior queueing or power models for data centers. The trace-calibrated framework that links arrivals, queuing, scheduling, and GPU power draw is a straightforward way to generate these patterns and makes the mechanism testable in principle. The paper does a clean job of showing how workload composition could give operators a lever on grid-facing dynamics without changing total energy use much. The soft spot is the complete absence of validation against measured power traces or hardware-level effects such as DVFS transitions and network contention. Without those checks, or at least sensitivity runs on the power-mapping assumptions, the reported shapes could be artifacts of how the simulator handles idle-to-active transitions. The abstract gives qualitative directions but no magnitudes, error bars, or comparisons, so the result stays preliminary. This is for readers working on data-center scheduling or grid integration of large compute loads. Someone already modeling AI power demand would get a useful hypothesis to test, but they would need the full model equations and any hidden validation steps before treating the shapes as reliable. I would send it to peer review because the core idea is worth a proper empirical test even if the current version is thin on evidence.

Referee Report

2 major / 2 minor

Summary. The paper claims that in shared-GPU AI data centers the mix of batch and inference workloads decouples aggregate power variability from short-horizon ramping: as the inference fraction rises, variability follows a U-shape while ramping follows a hump-shape (especially at high load). The mechanism is asymmetric—queued batch jobs fill inference-induced idle slots to smooth variability, yet inference fluctuations propagate more directly into realized power ramps. All results are obtained from a trace-calibrated discrete-event simulation that links arrivals, queueing, scheduling, and GPU power mapping.

Significance. If the simulation framework is shown to reproduce measured power traces, the decoupling result would be a useful contribution to the still-sparse literature on AI data-center grid dynamics. It supplies a concrete, workload-composition-based explanation for why variability and ramping need not move together, which could inform both data-center scheduling policies and grid-operator forecasting.

major comments (2)

[Methods / Simulation Framework] The central claim rests entirely on the trace-calibrated framework (described in the methods section). No quantitative comparison to measured power traces from operating AI clusters is presented, nor are error bars or sensitivity checks to calibration parameters or unmodeled effects (DVFS transitions, network contention) reported. This leaves the reported U- and hump-shaped patterns without external grounding.
[Results] Results section: the qualitative shapes are shown for varying inference shares and load levels, but the manuscript supplies neither statistical significance tests on the turning points nor robustness checks when the underlying arrival traces or scheduling policy parameters are perturbed. These omissions are load-bearing because the decoupling conclusion is presented as a general system property rather than a model-specific observation.

minor comments (2)

[Abstract] Abstract: the statement that “the magnitude and turning points … also depend on system loading” is not accompanied by any numerical values or figure references, making it difficult for a reader to assess the practical size of the effect.
[Methods] Notation: the precise time horizon used to define “short-horizon ramps” (e.g., 1 min, 5 min) should be stated explicitly when the metric is first introduced.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the insightful comments, which have helped us improve the clarity and robustness of our work. We address each major comment below and describe the revisions made to the manuscript.

read point-by-point responses

Referee: [Methods / Simulation Framework] The central claim rests entirely on the trace-calibrated framework (described in the methods section). No quantitative comparison to measured power traces from operating AI clusters is presented, nor are error bars or sensitivity checks to calibration parameters or unmodeled effects (DVFS transitions, network contention) reported. This leaves the reported U- and hump-shaped patterns without external grounding.

Authors: We concur that additional validation would enhance the credibility of the simulation results. Our framework is calibrated on workload arrival traces from public sources and employs GPU power models derived from hardware specifications and utilization. In the revised manuscript, we have expanded the Methods section to include detailed calibration procedures, error bars on key metrics derived from multiple simulation runs, and sensitivity analyses varying calibration parameters. We have also added a discussion of potential unmodeled effects such as DVFS transitions and network contention, noting their likely minor impact under the modeled conditions. However, we lack access to real-time power measurement data from commercial AI clusters, which prevents a direct quantitative match; this is acknowledged as a limitation and suggested for future research. revision: partial
Referee: [Results] Results section: the qualitative shapes are shown for varying inference shares and load levels, but the manuscript supplies neither statistical significance tests on the turning points nor robustness checks when the underlying arrival traces or scheduling policy parameters are perturbed. These omissions are load-bearing because the decoupling conclusion is presented as a general system property rather than a model-specific observation.

Authors: We appreciate this point and have revised the Results section to include statistical significance assessments. Specifically, we now report p-values from trend tests on the variability and ramping curves to confirm the U- and hump-shapes, along with confidence intervals obtained via bootstrapping over simulation replicates. Furthermore, we conducted robustness experiments by resampling the arrival traces and altering scheduling parameters (e.g., batch queue weights), demonstrating that the decoupling patterns remain consistent. These new analyses are incorporated into the main text and supplementary material, supporting the generality of the observed behavior beyond specific parameter choices. revision: yes

standing simulated objections not resolved

Direct quantitative comparison to measured power traces from operating AI clusters due to unavailability of such proprietary data.

Circularity Check

0 steps flagged

No significant circularity; results emerge from simulation

full rationale

The paper derives U-shaped variability and hump-shaped ramping as emergent outputs of a trace-calibrated framework that maps workload arrivals through queueing, scheduling, and GPU power models. No equations define the target shapes in terms of themselves, no fitted parameters are renamed as predictions, and no self-citation chain is invoked to force the asymmetric propagation result. The central claim rests on the model's mechanics applied to input traces rather than on any definitional equivalence or load-bearing self-reference, leaving the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on an unexamined simulation framework whose internal parameters, workload statistics, and power models are not disclosed in the abstract; this creates an unknown number of free parameters and domain assumptions.

pith-pipeline@v0.9.0 · 5471 in / 1191 out tokens · 39829 ms · 2026-05-10T15:38:02.616030+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Lawrence Berkeley National Laboratory, Berkeley, California

Shehabi, A., Smith, S.J., Masanet, E., Koomey, J., Horner, N., Shah, A., Lanzisera, S.: 2024 United States Data Center Energy Usage Report. Lawrence Berkeley National Laboratory, Berkeley, California. LBNL- 2001637. Accessed June 19, 2025 (2024). https://eta.lbl.gov/publications/ 2024-lbnl-data-center-energy-usage-report

work page 2024
[2]

Accessed July 10, 2025 (2024)

International Energy Agency: Energy Demand from AI. Accessed July 10, 2025 (2024). https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai 16

work page 2025
[3]

International Journal of Forecasting30(4), 1030–1081 (2014)

Weron, R.: Electricity price forecasting: A review of the state-of-the-art with a look into the future. International Journal of Forecasting30(4), 1030–1081 (2014)

work page 2014
[4]

Energy Systems Integration Group, Reston, VA

Redefining Resource Adequacy Task Force: Redefining Resource Adequacy for Modern Power Systems. Energy Systems Integration Group, Reston, VA. Accessed June 19, 2025 (2021). https://www.esig.energy/wp-content/uploads/ 2021/08/ESIG-Redefining-Resource-Adequacy-2021.pdf

work page 2025
[5]

North American Electric Reliability Corporation, Atlanta, GA

Integration of Variable Generation Task Force: Flexibility Requirements and Met- rics for Variable Generation: Implications for System Planning Studies. North American Electric Reliability Corporation, Atlanta, GA. Accessed June 19, 2025 (2010). https://www.nerc.com/files/ivgtf1-4 final.pdf

work page 2025
[6]

California Energy Commission

Gattaciecca, J., Trumbull, K., Krumholz, S., McKanna, K., DeShazo, J.R.: Identifying Effective Demand Response Program Designs for Residential Cus- tomers. California Energy Commission. Publication Number: CEC-500-2020-072. Accessed June 19, 2025 (2020). https://www.energy.ca.gov/sites/default/files/ 2021-05/CEC-500-2020-072.pdf

work page 2020
[7]

Convened by the World Resources Institute and the World Business Council for Sustainable Development

Greenhouse Gas Protocol: Scope 2 Guidance: An amendment to the GHG Pro- tocol Corporate Standard. Convened by the World Resources Institute and the World Business Council for Sustainable Development. Accessed July 10, 2025 (2023). https://ghgprotocol.org/sites/default/files/2023-03/Scope%202% 20Guidance.pdf

work page 2025
[8]

electric grid: A water- shed moment

Mural, R., Pherwani, D., Gupta, C., Yu, Y., Takahashi, A., Kim, D., Majumder, S., Lee, H., Yu, M., Xie, L.: AI, data centers, and the U.S. electric grid: A water- shed moment. Technical report, Belfer Center for Science and International Affairs (February 2026)

work page 2026
[9]

Data Center Power Outlook: Balancing Competing Power Consump- tion Needs

Lee, V.: U.S. Data Center Power Outlook: Balancing Competing Power Consump- tion Needs. Accessed June 19, 2025 (2024). https://www.linkedin.com/pulse/ us-data-center-power-outlook-balancing-competing-consumption-lee-iz4pe/

work page 2025
[10]

Science367(6481), 984–986 (2020)

Masanet, E., Shehabi, A., Lei, N., Smith, S., Koomey, J.: Recalibrating global data center energy-use estimates. Science367(6481), 984–986 (2020)

work page 2020
[11]

IEEE Transactions on Smart Grid12(4), 3056–3069 (2021)

Chen, M., Gao, C., Shahidehpour, M., Li, Z.: Incentive-compatible demand response for spatially coupled internet data centers in electricity markets. IEEE Transactions on Smart Grid12(4), 3056–3069 (2021)

work page 2021
[12]

Advances in Applied Energy17, 100202 (2025)

Riepin, I., Brown, T., Zavala, V.M.: Spatio-temporal load shifting for truly clean computing. Advances in Applied Energy17, 100202 (2025)

work page 2025
[13]

In: Proceedings of the 28th ACM International Conference 17 on Architectural Support for Programming Languages and Operating Systems, Volume 2, pp

Acun, B., Lee, B., Kazhamiaka, F., Maeng, K., Gupta, U., Chakkaravarthy, M., Brooks, D., Wu, C.-J.: Carbon explorer: A holistic framework for designing carbon aware datacenters. In: Proceedings of the 28th ACM International Conference 17 on Architectural Support for Programming Languages and Operating Systems, Volume 2, pp. 118–132 (2023)

work page 2023
[14]

In: Proceedings of the ACM SIGCOMM 2024 Conference, pp

Qian, K., Xi, Y., Cao, J., Gao, J., Xu, Y., Guan, Y., Fu, B., Shi, X., Zhu, F., Miao, R.,et al.: Alibaba HPN: A data center network for large language model training. In: Proceedings of the ACM SIGCOMM 2024 Conference, pp. 691–706 (2024)

work page 2024
[15]

In: 2025 USENIX Annual Technical Conference (USENIX ATC 25), pp

Wang, J., Wang, Y., Han, M., Chen, R.: Colocating ML inference and training with fast GPU memory handover. In: 2025 USENIX Annual Technical Conference (USENIX ATC 25), pp. 1657–1675 (2025)

work page 2025
[16]

In: 2024 IEEE 44th Inter- national Conference on Distributed Computing Systems (ICDCS), pp

Chen, G., Subramaniyan, S., Wang, X.: Latency-guaranteed co-location of infer- ence and training for reducing data center expenses. In: 2024 IEEE 44th Inter- national Conference on Distributed Computing Systems (ICDCS), pp. 473–484 (2024)

work page 2024
[17]

In: Proceedings of the 29th ACM International Conference on Architectural Sup- port for Programming Languages and Operating Systems, Volume 3

Patel, P., Choukse, E., Zhang, C., Goiri, ´I., Warrier, B., Mahalingam, N., Bian- chini, R.: Characterizing power management opportunities for LLMs in the cloud. In: Proceedings of the 29th ACM International Conference on Architectural Sup- port for Programming Languages and Operating Systems, Volume 3. ASPLOS ’24, pp. 207–222. Association for Computing M...

work page 2024
[18]

Choukse, B

Choukse, E., Warrier, B., Heath, S., Belmont, L., Zhao, A., Khan, H.A., Harry, B., Kappel, M., Hewett, R.J., Datta, K.,et al.: Power stabilization for AI training datacenters. arXiv preprint (2025) https://doi.org/10.48550/arXiv.2508.14318

work page doi:10.48550/arxiv.2508.14318 2025
[19]

arXiv:2403.20306 [cs.AI] https://arxiv.org/abs/2403.20306

Stojkovic, J., Choukse, E., Zhang, C., Goiri, ´I., Torrellas, J.: Towards greener LLMs: Bringing energy-efficiency to the forefront of LLM inference. arXiv preprint (2024) https://doi.org/10.48550/arXiv.2403.20306

work page doi:10.48550/arxiv.2403.20306 2024
[20]

The ml. energy benchmark: Toward automated inference energy measurement and optimization,

Chung, J.-W., Liu, J., Ma, J.J., Wu, R., Kweon, O.J., Xia, Y., Wu, Z., Chowd- hury, M.: The ML.ENERGY benchmark: Toward automated inference energy measurement and optimization. arXiv preprint (2025) https://doi.org/10.48550/ arXiv.2505.06371

work page arXiv 2025
[21]

In: Proceedings of the Tenth European Conference on Computer Systems, pp

Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Proceedings of the Tenth European Conference on Computer Systems, pp. 1–17 (2015)

work page 2015
[22]

In: Workshop on Job Scheduling Strategies for Parallel Processing, pp

Yoo, A.B., Jette, M.A., Grondona, M.: Slurm: Simple Linux utility for resource management. In: Workshop on Job Scheduling Strategies for Parallel Processing, pp. 44–60 (2003)

work page 2003
[23]

Communications of the ACM59(5), 50–57 (2016) 18

Burns, B., Grant, B., Oppenheimer, D., Brewer, E., Wilkes, J.: Borg, Omega, and Kubernetes. Communications of the ACM59(5), 50–57 (2016) 18

work page 2016
[24]

In: 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp

Eisenman, A., Matam, K.K., Ingram, S., Mudigere, D., Krishnamoorthi, R., Nair, K., Smelyanskiy, M., Annavaram, M.: Check-N-Run: A checkpointing system for training deep learning recommendation models. In: 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp. 929–943 (2022)

work page 2022
[25]

In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp

Stojkovic, J., Zhang, C., Goiri, ´I., Torrellas, J., Choukse, E.: DynamoLLM: Designing LLM inference clusters for performance and energy efficiency. In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 1348–1362 (2025)

work page 2025
[26]

In: Proceedings of the 29th Symposium on Operating Systems Principles, pp

Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with PagedAttention. In: Proceedings of the 29th Symposium on Operating Systems Principles, pp. 611–626 (2023)

work page 2023
[27]

In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp

Zhong, Y., Liu, S., Chen, J., Hu, J., Zhu, Y., Liu, X., Jin, X., Zhang, H.: Dist- Serve: Disaggregating prefill and decoding for goodput-optimized large language model serving. In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp. 193–210 (2024)

work page 2024
[28]

In: 2021 IEEE High Performance Extreme Computing Conference (HPEC), pp

Samsi, S., Weiss, M.L., Bestor, D., Li, B., Jones, M., Reuther, A., Edelman, D., Arcand, W., Byun, C., Holodnack, J.,et al.: The MIT Supercloud dataset. In: 2021 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–8 (2021)

work page 2021
[29]

IEEE Transactions on Industry Applications54(6), 5599– 5608 (2018)

Ding, Z., Xie, L., Lu, Y., Wang, P., Xia, S.: Emission-aware stochastic resource planning scheme for data center microgrid considering batch workload scheduling and risk management. IEEE Transactions on Industry Applications54(6), 5599– 5608 (2018)

work page 2018
[30]

IEEE Transactions on Smart Grid9(4), 3748–3762 (2016)

Yu, L., Jiang, T., Zou, Y.: Distributed real-time energy management in data center microgrids. IEEE Transactions on Smart Grid9(4), 3748–3762 (2016)

work page 2016
[31]

Journal of Parallel and Distributed Computing 63(11), 1105–1122 (2003)

Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63(11), 1105–1122 (2003)

work page 2003
[32]

Joe H.: Hierarchical grouping to optimize an objective function

Ward, J. Joe H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association58(301), 236–244 (1963)

work page 1963
[33]

Cambridge University Press, Cambridge, UK (2011)

Hilbe, J.M.: Negative Binomial Regression, 2nd edn. Cambridge University Press, Cambridge, UK (2011)

work page 2011
[34]

Journal of the Royal Statistical Society: Series B (Methodological)44(2), 139–160 (1982)

Aitchison, J.: The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological)44(2), 139–160 (1982)

work page 1982
[35]

19 Chapman and Hall/CRC, New York (1995)

Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. 19 Chapman and Hall/CRC, New York (1995)

work page 1995
[36]

IEEE Transactions on Acoustics, Speech, and Signal Processing35(3), 400–401 (1987)

Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing35(3), 400–401 (1987)

work page 1987
[37]

In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp

Sheng, Y., Cao, S., Li, D., Zhu, B., Li, Z., Zhuo, D., Gonzalez, J.E., Stoica, I.: Fairness in serving large language models. In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp. 965–988 (2024)

work page 2024
[38]

IEEE Transactions on Parallel and Distributed Systems18(6), 789–803 (2007)

Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated pre- dictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems18(6), 789–803 (2007)

work page 2007
[39]

ACM Transactions on Computer Systems21(2), 207–233 (2003)

Harchol-Balter, M., Schroeder, B., Bansal, N., Agrawal, M.: Size-based scheduling to improve web performance. ACM Transactions on Computer Systems21(2), 207–233 (2003)

work page 2003
[40]

Theoretical Computer Science130(1), 17–47 (1994) 20

Motwani, R., Phillips, S., Torng, E.: Nonclairvoyant scheduling. Theoretical Computer Science130(1), 17–47 (1994) 20

work page 1994