pith. sign in

arxiv: 2604.27207 · v1 · submitted 2026-04-29 · 📡 eess.SY · cs.SY

Regime-Adaptive Weighted Ensemble Learning for Computing-Driven Dynamic Load Forecasting in AI Data Centers

Pith reviewed 2026-05-07 08:13 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords AI data centersload forecastingensemble learningdynamic weightingnon-stationary time seriesdemand responsecomputing workloadsshort-term prediction
0
0 comments X

The pith

A regime-adaptive ensemble learning method reduces minute-class load forecasting errors for AI data centers below 1%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a regime-adaptive ensemble learning algorithm for short-term forecasting of computing-driven dynamic loads in AI data centers. The method uses a weight-learned neural network to combine two machine learning submodels by dynamically adjusting their contributions according to changing operating regimes. A new incremental feature engineering approach updates the model continuously from non-stationary data streams. Comparative tests on the MIT Supercloud dataset show the ensemble outperforms alternative model combinations and achieves forecasting errors below 1 percent. If the claim holds, the approach would enable tighter coordination between data centers and power grids for demand response.

Core claim

The paper claims that embedding a weight-learned neural network inside an ensemble framework, paired with incremental feature engineering, allows the system to exploit complementary strengths of two machine learning submodels across varying regimes, adapt to non-stationary computing-driven workloads, and deliver the first sub-1% error rates on minute-ahead predictions for AI data center loads.

What carries the argument

A weight-learned neural network that dynamically sets ensemble weights for two complementary machine learning submodels, updated via incremental feature engineering on non-stationary streams.

If this is right

  • Forecasting accuracy and adaptivity improve across diverse operating regimes in AI data centers.
  • The selected pair of machine learning models outperforms other possible ensemble combinations.
  • The method supports grid-interactive coordination and demand response programs.
  • Minute-class forecasting errors for AI data center loads reach below 1% for the first time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The adaptive weighting could integrate with real-time energy optimization systems to lower power costs in large computing facilities.
  • Similar dynamic ensemble techniques might apply to forecasting other bursty loads such as electric vehicle charging or wind generation.
  • Validation on commercial cloud datasets would check whether the sub-1% error holds beyond academic supercomputer workloads.
  • Linking the forecasts to electricity market bidding tools could enable proactive demand response participation.

Load-bearing premise

That the two chosen machine learning submodels will reliably display complementary strengths across all operating regimes and that the neural network can learn stable weights without overfitting to non-stationary load patterns.

What would settle it

A test on load data from a different AI data center where the ensemble error stays above 1% or where one submodel alone matches the ensemble accuracy.

Figures

Figures reproduced from arXiv: 2604.27207 by Lei Wang, Ying Zhang, Yuzhang Lin, Ziying Wang.

Figure 1
Figure 1. Figure 1: Architecture of the proposed regime-adaptive ensemble learning method that can adapt to various computing regimes. view at source ↗
Figure 3
Figure 3. Figure 3: Prediction performance under four different operating regimes: (a) view at source ↗
Figure 2
Figure 2. Figure 2: Power forecasting results over a future forecasting horizon involving view at source ↗
Figure 4
Figure 4. Figure 4: Forecasting errors of two-model ensembles for one-step-ahead prediction. (a) NRMSE. (b) NMAE. view at source ↗
read the original abstract

Short-term load forecasting for AI data centers presents new challenges because it is computing-driven, with heterogeneous job arrivals, sizes, and durations exhibiting bursty, non-stationary dynamics. Compared with traditional load types, data center loads are less researched and can pose greater threats to the efficiency and stability of power grids. To close the gap, this paper proposes a regime-adaptive ensemble learning forecasting algorithm to predict computing-driven dynamic workloads in AI data centers. A weight-learned neural network within an ensemble learning framework is developed to exploit the complementary strengths of two machine learning (ML) submodels across varying operating regimes. Furthermore, a novel feature engineering strategy is developed to incrementally learn from a non-stationary data stream. Thus, the ensemble weights are dynamically optimized to facilitate adaptive calibration of inter-submodel contributions. Comparative case studies on the MIT Supercloud dataset demonstrate that the proposed method significantly enhances load forecasting accuracy and adaptivity across various regimes, and the selected combination of ML models for ensemble learning outperforms other possible combinations. To the best of our knowledge, our method is the first to reduce minute-class forecasting errors for AI data center loads to below 1%, highlighting its potential for grid-interactive coordination and demand response.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a regime-adaptive weighted ensemble learning algorithm for short-term forecasting of computing-driven dynamic loads in AI data centers. It develops a weight-learned neural network to combine two ML submodels by exploiting their complementary strengths across operating regimes, along with a novel incremental feature engineering strategy for non-stationary data streams. Comparative studies on the MIT Supercloud dataset are presented to show significant accuracy improvements, with the claim that the method achieves minute-class forecasting errors below 1% for the first time.

Significance. If the empirical results hold under rigorous validation, this work could be significant for the field of power systems and control, particularly in managing the integration of AI data centers into the grid. The focus on bursty, non-stationary loads addresses a timely challenge, and the adaptive ensemble approach offers a practical way to improve forecasting for demand response applications. The use of real-world dataset like MIT Supercloud adds to its relevance. However, the significance is tempered by the need for more detailed validation to confirm the claims.

major comments (3)
  1. [Abstract] Abstract: The assertion that the proposed method is the first to reduce minute-class forecasting errors to below 1% is central to the paper's contribution. However, this claim requires supporting evidence from a comprehensive comparison with existing literature on data center load forecasting, which is not detailed in the abstract or summary.
  2. [Abstract] Abstract (comparative case studies): The paper states that the selected combination of ML models outperforms other possible combinations and that the method enhances accuracy across regimes. To substantiate the sub-1% error claim and the adaptivity, an ablation study isolating the regime-adaptive weighting mechanism and the incremental feature engineering is necessary. Without it, the contribution of these components to avoiding overfitting on non-stationary traces remains unclear.
  3. [Abstract] Abstract (method description): The weight-learned neural network for dynamic optimization of ensemble weights is key to the regime-adaptive aspect. Details on the training of this network, including the loss function used, regularization to prevent overfitting, and how it handles the non-stationary nature of the data, should be provided to ensure the results are not due to dataset-specific memorization.
minor comments (2)
  1. [Abstract] Abstract: The abstract mentions 'various regimes' but does not specify how regimes are identified or the number of regimes considered in the experiments.
  2. [Abstract] Abstract: Clarify the exact error metric used for the sub-1% claim (e.g., whether it is MAPE, NRMSE, or another measure) and provide error bars or confidence intervals from multiple runs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We have addressed each major comment point by point below. Where revisions are warranted to improve clarity, validation, and substantiation of claims, we will incorporate them in the revised version. We believe these changes will strengthen the paper without altering its core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that the proposed method is the first to reduce minute-class forecasting errors to below 1% is central to the paper's contribution. However, this claim requires supporting evidence from a comprehensive comparison with existing literature on data center load forecasting, which is not detailed in the abstract or summary.

    Authors: We agree that the abstract should not assert a 'first' claim without explicit supporting details within the abstract itself. The manuscript's introduction contains a literature review and comparisons with prior data center load forecasting approaches, but to address this concern directly, we will revise the abstract to remove the 'first to' phrasing. The revised abstract will instead state that the method achieves minute-class forecasting errors below 1% on the MIT Supercloud dataset, with the full comparative analysis and literature context provided in the main text. This ensures the abstract accurately reflects the paper's content. revision: yes

  2. Referee: [Abstract] Abstract (comparative case studies): The paper states that the selected combination of ML models outperforms other possible combinations and that the method enhances accuracy across regimes. To substantiate the sub-1% error claim and the adaptivity, an ablation study isolating the regime-adaptive weighting mechanism and the incremental feature engineering is necessary. Without it, the contribution of these components to avoiding overfitting on non-stationary traces remains unclear.

    Authors: We concur that an ablation study would provide stronger evidence for the individual contributions of the regime-adaptive weighting and incremental feature engineering. The current manuscript presents comparative results showing the ensemble's superiority over individual models and alternative combinations, along with regime-specific accuracy improvements. However, to directly isolate these components and address potential overfitting on non-stationary data, we will add a dedicated ablation study in the revised manuscript. This will report performance metrics for ablated variants (with/without adaptive weighting and with/without incremental feature engineering) to quantify their roles in achieving sub-1% errors and adaptivity. revision: yes

  3. Referee: [Abstract] Abstract (method description): The weight-learned neural network for dynamic optimization of ensemble weights is key to the regime-adaptive aspect. Details on the training of this network, including the loss function used, regularization to prevent overfitting, and how it handles the non-stationary nature of the data, should be provided to ensure the results are not due to dataset-specific memorization.

    Authors: We thank the referee for highlighting the need for greater methodological detail. The manuscript outlines the weight-learned neural network architecture and its role in the ensemble, but we will expand the method section in the revision to include the requested specifics. This will cover the loss function (mean squared error for weight optimization), regularization techniques (dropout and L2 penalties to prevent overfitting), and mechanisms for non-stationarity (sliding-window training combined with incremental updates). These additions will clarify that the network's performance stems from its adaptive design rather than dataset-specific memorization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML ensemble with externally evaluated performance

full rationale

The paper describes a data-driven algorithmic framework that trains a weight-learning neural network on observed load traces to combine two submodels, augmented by incremental feature engineering on non-stationary streams. All reported performance figures, including the sub-1% minute-ahead error, are obtained by direct comparison against held-out portions of the external MIT Supercloud dataset rather than by any internal redefinition of the target metric. No equations equate the forecasting loss to a fitted parameter by construction, no uniqueness theorems are imported from prior self-work, and no ansatz is smuggled via citation; the derivation chain consists of standard supervised training and cross-regime evaluation steps that remain independently falsifiable on new data.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on standard machine-learning assumptions about model complementarity and data characteristics plus multiple fitted parameters inside the neural network and submodels; no new physical entities are postulated.

free parameters (2)
  • ensemble weights
    Dynamic weights produced by the neural network that control the contribution of each submodel; these are learned from data.
  • submodel hyperparameters
    Parameters of the two underlying machine learning models and the feature engineering components that are tuned during training.
axioms (2)
  • domain assumption The two machine learning submodels possess complementary predictive strengths across different operating regimes of AI data center loads.
    This premise justifies the ensemble construction and is invoked to explain why adaptive weighting improves accuracy.
  • domain assumption Non-stationary data streams from computing workloads can be handled effectively by the proposed incremental feature engineering strategy.
    Central assumption enabling the method to adapt without full retraining on each new data segment.

pith-pipeline@v0.9.0 · 5519 in / 1581 out tokens · 87170 ms · 2026-05-07T08:13:47.727897+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Understanding the inception of 14.7 hz oscillations emerging from a data center,

    C. Mishra, L. Vanfretti, J. Delaree Jr, T. Purcell, and K. D. Jones, “Understanding the inception of 14.7 hz oscillations emerging from a data center,”Sustainable Energy, Grids and Networks, p. 101735, 2025

  2. [2]

    Evaluating the risk to bulk power system reliability from large load induced oscillations,

    S. Biswas, A. C. Varghese, K. Chatterjee, S. Nekkalapu, B. Ross, and J. Follum, “Evaluating the risk to bulk power system reliability from large load induced oscillations,”Authorea Preprints, 2025

  3. [3]

    An assessment of large load interconnection risks in the western interconnection,

    R. Quint, J. Zhao, and K. Thomas, “An assessment of large load interconnection risks in the western interconnection,”Council, Western Electricity Coordinating, Tech. Rep, 2025

  4. [4]

    Wide-Area Power System Oscillations from Large-Scale AI Workloads

    M.-S. Ko and H. Zhu, “Wide-area power system oscillations from large- scale ai workloads,”arXiv preprint arXiv:2508.16457, 2025

  5. [5]

    Characterization and prediction of deep learning workloads in large-scale gpu datacenters,

    Q. Hu, P. Sun, S. Yan, Y . Wen, and T. Zhang, “Characterization and prediction of deep learning workloads in large-scale gpu datacenters,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021, pp. 1–15

  6. [6]

    Minute-level ultra-short-term power load forecasting based on time series data features,

    C. Wang, H. Zhao, Y . Liu, and G. Fan, “Minute-level ultra-short-term power load forecasting based on time series data features,”Applied Energy, vol. 372, p. 123801, 2024

  7. [7]

    Self-aware workload fore- casting in data center power prediction,

    Y .-F. Hsu, K. Matsuda, and M. Matsuoka, “Self-aware workload fore- casting in data center power prediction,” in2018 18th IEEE/ACM Inter- national Symposium on Cluster , Cloud and Grid Computing (CCGRID). IEEE, 2018, pp. 321–330

  8. [8]

    A novel xg-boost approach for electricity load balancing and demand prediction,

    M. Kathiravanet al., “A novel xg-boost approach for electricity load balancing and demand prediction,” in2023 International Conference on Sustainable Communication Networks and Application (ICSCNA). IEEE, 2023, pp. 691–696

  9. [9]

    Artificial intelligence applications in electric distribution systems: post-pandemic progress and prospect,

    S. Chung and Y . Zhang, “Artificial intelligence applications in electric distribution systems: post-pandemic progress and prospect,”Applied Sciences, vol. 13, no. 12, p. 6937, 2023

  10. [10]

    Short-term load forecasting for ai-data center,

    M. Mughees, Y . Li, Y . Chen, and Y . R. Li, “Short-term load forecasting for ai-data center,”arXiv preprint arXiv:2503.07756, 2025

  11. [11]

    Electricity demand and grid impacts of ai data centers: Challenges and prospects,

    X. Chen, X. Wang, A. Colacelli, M. Lee, and L. Xie, “Electricity demand and grid impacts of ai data centers: Challenges and prospects,”arXiv preprint arXiv:2509.07218, 2025

  12. [12]

    A data mining approach combiningk-means clustering with bagging neural network for short-term wind power forecasting,

    W. Wu and M. Peng, “A data mining approach combiningk-means clustering with bagging neural network for short-term wind power forecasting,”IEEE Internet of Things Journal, vol. 4, no. 4, pp. 979–986, 2017

  13. [13]

    Research on short-term load forecasting using xgboost based on similar days,

    X. Liao, N. Cao, M. Li, and X. Kang, “Research on short-term load forecasting using xgboost based on similar days,” in2019 International conference on intelligent transportation, big data & smart city (ICITBS). IEEE, 2019, pp. 675–678

  14. [14]

    The mit su- percloud dataset,

    S. Samsi, M. L. Weiss, D. Bestor, B. Li, M. Jones, A. Reuther, D. Edelman, W. Arcand, C. Byun, J. Holodnacket al., “The mit su- percloud dataset,” in2021 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2021, pp. 1–8

  15. [15]

    A self-supervised pre-learning method for low wind power forecasting,

    W. Song, J. Yan, S. Han, N. Zhang, S. Liu, C. Ge, and Y . Liu, “A self-supervised pre-learning method for low wind power forecasting,” IEEE Transactions on Sustainable Energy, vol. 16, no. 3, pp. 1723– 1736, 2025

  16. [16]

    Adaptive weighted combination approach for wind power forecast based on deep deterministic policy gradient method,

    M. Li, M. Yang, Y . Yu, M. Shahidehpour, and F. Wen, “Adaptive weighted combination approach for wind power forecast based on deep deterministic policy gradient method,”IEEE Transactions on Power Systems, vol. 39, no. 2, pp. 3075–3087, 2023