pith. the verified trust layer for science. sign in

arxiv: 2604.10769 · v1 · submitted 2026-04-12 · 📡 eess.SY · cs.DC· cs.PF· cs.SY

Workload composition smooths aggregate power demand while sustaining short-horizon ramps in AI data centers

Pith reviewed 2026-05-10 15:38 UTC · model grok-4.3

classification 📡 eess.SY cs.DCcs.PFcs.SY
keywords AI data centerspower demandworkload compositionbatch jobsinference workloadspower variabilityshort-horizon rampinggrid impact
0
0 comments X

The pith

Mixing batch and inference workloads in AI data centers decouples power variability from short-horizon ramping

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the ratio of batch to inference jobs in shared-GPU systems controls two separate aspects of power demand. Raising the inference share produces a U-shaped pattern in overall power variability and a hump-shaped pattern in short-term ramping needs, with both effects growing stronger at higher total loads. Batch jobs absorb spare capacity during inference dips and thereby smooth the combined power trace. Inference changes still reach the power draw quickly because they act directly on active GPUs. The result is that data centers can shape their grid impact through workload mix rather than through total compute volume alone.

Core claim

In shared-GPU systems the composition of batch and inference workloads decouples aggregate power variability from short-horizon ramping. As the inference share rises, variability becomes U-shaped whereas ramping becomes hump-shaped, particularly under higher loading. The underlying mechanism is asymmetric: at intermediate workload mixes queued batch jobs fill capacity left idle by fluctuating inference demand, reducing aggregate power variability; short-horizon ramping remains elevated because inference-side fluctuations propagate more directly into realized power.

What carries the argument

Asymmetric buffering by queued batch jobs of fluctuations in inference demand within shared-GPU systems

Load-bearing premise

The trace-calibrated model of arrivals, queues, scheduling and GPU power accurately reproduces the asymmetric effect of inference fluctuations on total power without missing hardware-state or contention effects.

What would settle it

Power traces collected from a real shared-GPU cluster while the inference-to-batch ratio is deliberately varied would show whether variability follows a U-shape and ramping follows a hump-shape.

read the original abstract

Artificial intelligence (AI) is driving rapid growth in electricity demand, yet the grid-facing power dynamics of AI data centers remain poorly understood. Here we show that, in shared-GPU systems, the composition of batch and inference workloads decouples aggregate power variability from short-horizon ramping. As the inference share rises, variability becomes U-shaped, whereas ramping becomes hump-shaped, particularly under higher loading. The magnitude and turning points of these patterns also depend on system loading. Using a trace-calibrated framework linking workload arrivals, queueing, scheduling, and GPU power, we show that the underlying mechanism is asymmetric. At intermediate workload mixes, queued batch jobs fill capacity left idle by fluctuating inference demand, reducing aggregate power variability. However, short-horizon ramping remains elevated because inference-side fluctuations propagate more directly into realized power. AI data centers should therefore be understood as dynamic systems whose workload composition shapes their grid impact.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that in shared-GPU AI data centers the mix of batch and inference workloads decouples aggregate power variability from short-horizon ramping: as the inference fraction rises, variability follows a U-shape while ramping follows a hump-shape (especially at high load). The mechanism is asymmetric—queued batch jobs fill inference-induced idle slots to smooth variability, yet inference fluctuations propagate more directly into realized power ramps. All results are obtained from a trace-calibrated discrete-event simulation that links arrivals, queueing, scheduling, and GPU power mapping.

Significance. If the simulation framework is shown to reproduce measured power traces, the decoupling result would be a useful contribution to the still-sparse literature on AI data-center grid dynamics. It supplies a concrete, workload-composition-based explanation for why variability and ramping need not move together, which could inform both data-center scheduling policies and grid-operator forecasting.

major comments (2)
  1. [Methods / Simulation Framework] The central claim rests entirely on the trace-calibrated framework (described in the methods section). No quantitative comparison to measured power traces from operating AI clusters is presented, nor are error bars or sensitivity checks to calibration parameters or unmodeled effects (DVFS transitions, network contention) reported. This leaves the reported U- and hump-shaped patterns without external grounding.
  2. [Results] Results section: the qualitative shapes are shown for varying inference shares and load levels, but the manuscript supplies neither statistical significance tests on the turning points nor robustness checks when the underlying arrival traces or scheduling policy parameters are perturbed. These omissions are load-bearing because the decoupling conclusion is presented as a general system property rather than a model-specific observation.
minor comments (2)
  1. [Abstract] Abstract: the statement that “the magnitude and turning points … also depend on system loading” is not accompanied by any numerical values or figure references, making it difficult for a reader to assess the practical size of the effect.
  2. [Methods] Notation: the precise time horizon used to define “short-horizon ramps” (e.g., 1 min, 5 min) should be stated explicitly when the metric is first introduced.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the insightful comments, which have helped us improve the clarity and robustness of our work. We address each major comment below and describe the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Methods / Simulation Framework] The central claim rests entirely on the trace-calibrated framework (described in the methods section). No quantitative comparison to measured power traces from operating AI clusters is presented, nor are error bars or sensitivity checks to calibration parameters or unmodeled effects (DVFS transitions, network contention) reported. This leaves the reported U- and hump-shaped patterns without external grounding.

    Authors: We concur that additional validation would enhance the credibility of the simulation results. Our framework is calibrated on workload arrival traces from public sources and employs GPU power models derived from hardware specifications and utilization. In the revised manuscript, we have expanded the Methods section to include detailed calibration procedures, error bars on key metrics derived from multiple simulation runs, and sensitivity analyses varying calibration parameters. We have also added a discussion of potential unmodeled effects such as DVFS transitions and network contention, noting their likely minor impact under the modeled conditions. However, we lack access to real-time power measurement data from commercial AI clusters, which prevents a direct quantitative match; this is acknowledged as a limitation and suggested for future research. revision: partial

  2. Referee: [Results] Results section: the qualitative shapes are shown for varying inference shares and load levels, but the manuscript supplies neither statistical significance tests on the turning points nor robustness checks when the underlying arrival traces or scheduling policy parameters are perturbed. These omissions are load-bearing because the decoupling conclusion is presented as a general system property rather than a model-specific observation.

    Authors: We appreciate this point and have revised the Results section to include statistical significance assessments. Specifically, we now report p-values from trend tests on the variability and ramping curves to confirm the U- and hump-shapes, along with confidence intervals obtained via bootstrapping over simulation replicates. Furthermore, we conducted robustness experiments by resampling the arrival traces and altering scheduling parameters (e.g., batch queue weights), demonstrating that the decoupling patterns remain consistent. These new analyses are incorporated into the main text and supplementary material, supporting the generality of the observed behavior beyond specific parameter choices. revision: yes

standing simulated objections not resolved
  • Direct quantitative comparison to measured power traces from operating AI clusters due to unavailability of such proprietary data.

Circularity Check

0 steps flagged

No significant circularity; results emerge from simulation

full rationale

The paper derives U-shaped variability and hump-shaped ramping as emergent outputs of a trace-calibrated framework that maps workload arrivals through queueing, scheduling, and GPU power models. No equations define the target shapes in terms of themselves, no fitted parameters are renamed as predictions, and no self-citation chain is invoked to force the asymmetric propagation result. The central claim rests on the model's mechanics applied to input traces rather than on any definitional equivalence or load-bearing self-reference, leaving the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on an unexamined simulation framework whose internal parameters, workload statistics, and power models are not disclosed in the abstract; this creates an unknown number of free parameters and domain assumptions.

pith-pipeline@v0.9.0 · 5471 in / 1191 out tokens · 39829 ms · 2026-05-10T15:38:02.616030+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Lawrence Berkeley National Laboratory, Berkeley, California

    Shehabi, A., Smith, S.J., Masanet, E., Koomey, J., Horner, N., Shah, A., Lanzisera, S.: 2024 United States Data Center Energy Usage Report. Lawrence Berkeley National Laboratory, Berkeley, California. LBNL- 2001637. Accessed June 19, 2025 (2024). https://eta.lbl.gov/publications/ 2024-lbnl-data-center-energy-usage-report

  2. [2]

    Accessed July 10, 2025 (2024)

    International Energy Agency: Energy Demand from AI. Accessed July 10, 2025 (2024). https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai 16

  3. [3]

    International Journal of Forecasting30(4), 1030–1081 (2014)

    Weron, R.: Electricity price forecasting: A review of the state-of-the-art with a look into the future. International Journal of Forecasting30(4), 1030–1081 (2014)

  4. [4]

    Energy Systems Integration Group, Reston, VA

    Redefining Resource Adequacy Task Force: Redefining Resource Adequacy for Modern Power Systems. Energy Systems Integration Group, Reston, VA. Accessed June 19, 2025 (2021). https://www.esig.energy/wp-content/uploads/ 2021/08/ESIG-Redefining-Resource-Adequacy-2021.pdf

  5. [5]

    North American Electric Reliability Corporation, Atlanta, GA

    Integration of Variable Generation Task Force: Flexibility Requirements and Met- rics for Variable Generation: Implications for System Planning Studies. North American Electric Reliability Corporation, Atlanta, GA. Accessed June 19, 2025 (2010). https://www.nerc.com/files/ivgtf1-4 final.pdf

  6. [6]

    California Energy Commission

    Gattaciecca, J., Trumbull, K., Krumholz, S., McKanna, K., DeShazo, J.R.: Identifying Effective Demand Response Program Designs for Residential Cus- tomers. California Energy Commission. Publication Number: CEC-500-2020-072. Accessed June 19, 2025 (2020). https://www.energy.ca.gov/sites/default/files/ 2021-05/CEC-500-2020-072.pdf

  7. [7]

    Convened by the World Resources Institute and the World Business Council for Sustainable Development

    Greenhouse Gas Protocol: Scope 2 Guidance: An amendment to the GHG Pro- tocol Corporate Standard. Convened by the World Resources Institute and the World Business Council for Sustainable Development. Accessed July 10, 2025 (2023). https://ghgprotocol.org/sites/default/files/2023-03/Scope%202% 20Guidance.pdf

  8. [8]

    electric grid: A water- shed moment

    Mural, R., Pherwani, D., Gupta, C., Yu, Y., Takahashi, A., Kim, D., Majumder, S., Lee, H., Yu, M., Xie, L.: AI, data centers, and the U.S. electric grid: A water- shed moment. Technical report, Belfer Center for Science and International Affairs (February 2026)

  9. [9]

    Data Center Power Outlook: Balancing Competing Power Consump- tion Needs

    Lee, V.: U.S. Data Center Power Outlook: Balancing Competing Power Consump- tion Needs. Accessed June 19, 2025 (2024). https://www.linkedin.com/pulse/ us-data-center-power-outlook-balancing-competing-consumption-lee-iz4pe/

  10. [10]

    Science367(6481), 984–986 (2020)

    Masanet, E., Shehabi, A., Lei, N., Smith, S., Koomey, J.: Recalibrating global data center energy-use estimates. Science367(6481), 984–986 (2020)

  11. [11]

    IEEE Transactions on Smart Grid12(4), 3056–3069 (2021)

    Chen, M., Gao, C., Shahidehpour, M., Li, Z.: Incentive-compatible demand response for spatially coupled internet data centers in electricity markets. IEEE Transactions on Smart Grid12(4), 3056–3069 (2021)

  12. [12]

    Advances in Applied Energy17, 100202 (2025)

    Riepin, I., Brown, T., Zavala, V.M.: Spatio-temporal load shifting for truly clean computing. Advances in Applied Energy17, 100202 (2025)

  13. [13]

    In: Proceedings of the 28th ACM International Conference 17 on Architectural Support for Programming Languages and Operating Systems, Volume 2, pp

    Acun, B., Lee, B., Kazhamiaka, F., Maeng, K., Gupta, U., Chakkaravarthy, M., Brooks, D., Wu, C.-J.: Carbon explorer: A holistic framework for designing carbon aware datacenters. In: Proceedings of the 28th ACM International Conference 17 on Architectural Support for Programming Languages and Operating Systems, Volume 2, pp. 118–132 (2023)

  14. [14]

    In: Proceedings of the ACM SIGCOMM 2024 Conference, pp

    Qian, K., Xi, Y., Cao, J., Gao, J., Xu, Y., Guan, Y., Fu, B., Shi, X., Zhu, F., Miao, R.,et al.: Alibaba HPN: A data center network for large language model training. In: Proceedings of the ACM SIGCOMM 2024 Conference, pp. 691–706 (2024)

  15. [15]

    In: 2025 USENIX Annual Technical Conference (USENIX ATC 25), pp

    Wang, J., Wang, Y., Han, M., Chen, R.: Colocating ML inference and training with fast GPU memory handover. In: 2025 USENIX Annual Technical Conference (USENIX ATC 25), pp. 1657–1675 (2025)

  16. [16]

    In: 2024 IEEE 44th Inter- national Conference on Distributed Computing Systems (ICDCS), pp

    Chen, G., Subramaniyan, S., Wang, X.: Latency-guaranteed co-location of infer- ence and training for reducing data center expenses. In: 2024 IEEE 44th Inter- national Conference on Distributed Computing Systems (ICDCS), pp. 473–484 (2024)

  17. [17]

    In: Proceedings of the 29th ACM International Conference on Architectural Sup- port for Programming Languages and Operating Systems, Volume 3

    Patel, P., Choukse, E., Zhang, C., Goiri, ´I., Warrier, B., Mahalingam, N., Bian- chini, R.: Characterizing power management opportunities for LLMs in the cloud. In: Proceedings of the 29th ACM International Conference on Architectural Sup- port for Programming Languages and Operating Systems, Volume 3. ASPLOS ’24, pp. 207–222. Association for Computing M...

  18. [18]

    Choukse, B

    Choukse, E., Warrier, B., Heath, S., Belmont, L., Zhao, A., Khan, H.A., Harry, B., Kappel, M., Hewett, R.J., Datta, K.,et al.: Power stabilization for AI training datacenters. arXiv preprint (2025) https://doi.org/10.48550/arXiv.2508.14318

  19. [19]

    arXiv:2403.20306 [cs.AI] https://arxiv.org/abs/2403.20306

    Stojkovic, J., Choukse, E., Zhang, C., Goiri, ´I., Torrellas, J.: Towards greener LLMs: Bringing energy-efficiency to the forefront of LLM inference. arXiv preprint (2024) https://doi.org/10.48550/arXiv.2403.20306

  20. [20]

    The ml. energy benchmark: Toward automated inference energy measurement and optimization,

    Chung, J.-W., Liu, J., Ma, J.J., Wu, R., Kweon, O.J., Xia, Y., Wu, Z., Chowd- hury, M.: The ML.ENERGY benchmark: Toward automated inference energy measurement and optimization. arXiv preprint (2025) https://doi.org/10.48550/ arXiv.2505.06371

  21. [21]

    In: Proceedings of the Tenth European Conference on Computer Systems, pp

    Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., Wilkes, J.: Large-scale cluster management at Google with Borg. In: Proceedings of the Tenth European Conference on Computer Systems, pp. 1–17 (2015)

  22. [22]

    In: Workshop on Job Scheduling Strategies for Parallel Processing, pp

    Yoo, A.B., Jette, M.A., Grondona, M.: Slurm: Simple Linux utility for resource management. In: Workshop on Job Scheduling Strategies for Parallel Processing, pp. 44–60 (2003)

  23. [23]

    Communications of the ACM59(5), 50–57 (2016) 18

    Burns, B., Grant, B., Oppenheimer, D., Brewer, E., Wilkes, J.: Borg, Omega, and Kubernetes. Communications of the ACM59(5), 50–57 (2016) 18

  24. [24]

    In: 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp

    Eisenman, A., Matam, K.K., Ingram, S., Mudigere, D., Krishnamoorthi, R., Nair, K., Smelyanskiy, M., Annavaram, M.: Check-N-Run: A checkpointing system for training deep learning recommendation models. In: 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp. 929–943 (2022)

  25. [25]

    In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp

    Stojkovic, J., Zhang, C., Goiri, ´I., Torrellas, J., Choukse, E.: DynamoLLM: Designing LLM inference clusters for performance and energy efficiency. In: 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 1348–1362 (2025)

  26. [26]

    In: Proceedings of the 29th Symposium on Operating Systems Principles, pp

    Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J., Zhang, H., Stoica, I.: Efficient memory management for large language model serving with PagedAttention. In: Proceedings of the 29th Symposium on Operating Systems Principles, pp. 611–626 (2023)

  27. [27]

    In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp

    Zhong, Y., Liu, S., Chen, J., Hu, J., Zhu, Y., Liu, X., Jin, X., Zhang, H.: Dist- Serve: Disaggregating prefill and decoding for goodput-optimized large language model serving. In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp. 193–210 (2024)

  28. [28]

    In: 2021 IEEE High Performance Extreme Computing Conference (HPEC), pp

    Samsi, S., Weiss, M.L., Bestor, D., Li, B., Jones, M., Reuther, A., Edelman, D., Arcand, W., Byun, C., Holodnack, J.,et al.: The MIT Supercloud dataset. In: 2021 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–8 (2021)

  29. [29]

    IEEE Transactions on Industry Applications54(6), 5599– 5608 (2018)

    Ding, Z., Xie, L., Lu, Y., Wang, P., Xia, S.: Emission-aware stochastic resource planning scheme for data center microgrid considering batch workload scheduling and risk management. IEEE Transactions on Industry Applications54(6), 5599– 5608 (2018)

  30. [30]

    IEEE Transactions on Smart Grid9(4), 3748–3762 (2016)

    Yu, L., Jiang, T., Zou, Y.: Distributed real-time energy management in data center microgrids. IEEE Transactions on Smart Grid9(4), 3748–3762 (2016)

  31. [31]

    Journal of Parallel and Distributed Computing 63(11), 1105–1122 (2003)

    Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63(11), 1105–1122 (2003)

  32. [32]

    Joe H.: Hierarchical grouping to optimize an objective function

    Ward, J. Joe H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association58(301), 236–244 (1963)

  33. [33]

    Cambridge University Press, Cambridge, UK (2011)

    Hilbe, J.M.: Negative Binomial Regression, 2nd edn. Cambridge University Press, Cambridge, UK (2011)

  34. [34]

    Journal of the Royal Statistical Society: Series B (Methodological)44(2), 139–160 (1982)

    Aitchison, J.: The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological)44(2), 139–160 (1982)

  35. [35]

    19 Chapman and Hall/CRC, New York (1995)

    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. 19 Chapman and Hall/CRC, New York (1995)

  36. [36]

    IEEE Transactions on Acoustics, Speech, and Signal Processing35(3), 400–401 (1987)

    Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing35(3), 400–401 (1987)

  37. [37]

    In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp

    Sheng, Y., Cao, S., Li, D., Zhu, B., Li, Z., Zhuo, D., Gonzalez, J.E., Stoica, I.: Fairness in serving large language models. In: 18th USENIX Symposium on Operating Systems Design and Implementation, pp. 965–988 (2024)

  38. [38]

    IEEE Transactions on Parallel and Distributed Systems18(6), 789–803 (2007)

    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated pre- dictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems18(6), 789–803 (2007)

  39. [39]

    ACM Transactions on Computer Systems21(2), 207–233 (2003)

    Harchol-Balter, M., Schroeder, B., Bansal, N., Agrawal, M.: Size-based scheduling to improve web performance. ACM Transactions on Computer Systems21(2), 207–233 (2003)

  40. [40]

    Theoretical Computer Science130(1), 17–47 (1994) 20

    Motwani, R., Phillips, S., Torng, E.: Nonclairvoyant scheduling. Theoretical Computer Science130(1), 17–47 (1994) 20