Recognition: unknown
Interpretable Physics-Informed Load Forecasting for U.S. Grid Resilience: SHAP-Guided Ensemble Validation in Hybrid Deep Learning Under Extreme Weather
Pith reviewed 2026-05-08 06:29 UTC · model grok-4.3
The pith
A hybrid CNN-Transformer ensemble regularized by ERCOT physics constraints improves electricity load forecasts during extreme weather.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The unified framework integrates CNN and Transformer branches fused by a weighted ensemble, regularized by physics-informed loss from the piecewise parabolic temperature-demand relationship, and interpreted via SHAP. On the ERCOT test data it reaches 713 MW MAE, 812 MW RMSE and 1.18% MAPE, with MAPE dropping 20.7% versus the Transformer and 40.5% versus the CNN on extreme events; the parabolic and ramp constraints account for a 14.7% RMSE reduction in ablation, while SHAP identifies a regime shift where temperature dominates normally but wind speed and precipitation gain influence during cold fronts and heatwaves.
What carries the argument
The physics-informed loss derived from the piecewise parabolic temperature-demand relationship of the ERCOT system, which embeds domain knowledge into the hybrid deep learning training to improve accuracy and interpretability.
If this is right
- The ensemble provides more accurate forecasts for grid operators managing reliability under extreme weather.
- SHAP attributions highlight shifting influences of weather variables, enabling targeted model refinements.
- Ablation studies confirm that the physics constraints contribute substantially to the observed error reductions.
- The identified regime shift suggests that forecasting models may benefit from condition-specific adaptations.
Where Pith is reading between the lines
- This physics-informed approach might extend to load forecasting in other regional grids if analogous temperature-demand relationships can be derived from their historical data.
- The SHAP-guided insights could inform the development of adaptive ensemble weights that change based on detected weather regimes.
- Future work could test the framework's robustness by applying it to projected climate data with intensified extreme weather events.
Load-bearing premise
The piecewise parabolic temperature-demand relationship derived from ERCOT historical data continues to hold without material deviation in the held-out test window and during the specific extreme weather events examined.
What would settle it
Applying the same framework to load and weather data from a different U.S. grid operator where the temperature-demand curve deviates from the parabolic form and checking whether MAPE on extreme events still improves over the individual branches.
Figures
read the original abstract
Accurate short-term electricity load forecasting is a cornerstone of U.S. grid reliability; however, prevailing deep learning models remain opaque, limiting operator trust during extreme weather. A unified, interpretable, physics-informed ensemble framework is proposed, integrating a Convolutional Neural Network (CNN) branch for local feature extraction and a Transformer branch for long-range dependency modeling; the branches are fused through a validation-optimized weighted ensemble and regularized by a physics-informed loss derived from the piecewise parabolic temperature-demand relationship of the Electric Reliability Council of Texas (ERCOT) system. Post-hoc interpretability is provided through SHapley Additive exPlanations (SHAP) with the DeepExplainer backend, yielding global and event-level attributions. Using eight years of ERCOT hourly load data (2018-2025) fused with Automated Surface Observing System (ASOS) records from three Texas stations, the framework achieves 713 MW MAE, 812 MW RMSE, and 1.18% MAPE on the test window. For Hampel-flagged extreme events, MAPE falls by 20.7% relative to its Transformer branch and by 40.5% relative to its CNN branch; an ablation confirms that the parabolic and ramp constraints drive a 14.7% RMSE reduction. SHAP analysis reveals a regime shift: temperature dominates under normal operation, whereas wind speed and precipitation become more influential during cold fronts and heatwaves.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a hybrid physics-informed ensemble for short-term ERCOT load forecasting that fuses a CNN branch (local features) and Transformer branch (long-range dependencies) via validation-tuned weights, regularized by a loss term encoding the piecewise parabolic temperature-demand relationship derived from ERCOT data, and augmented with SHAP (DeepExplainer) post-hoc explanations. On eight years of hourly ERCOT load (2018-2025) plus ASOS weather from three Texas stations, it reports test metrics of 713 MW MAE, 812 MW RMSE and 1.18% MAPE, with 20.7%/40.5% MAPE reductions on Hampel-flagged extremes relative to the individual branches and a 14.7% RMSE drop in ablation when the parabolic/ramp constraints are included; SHAP reveals temperature dominance in normal regimes shifting to wind/precipitation in extremes.
Significance. If the out-of-sample validity of the physics constraints is confirmed and the performance gains hold under rigorous validation, the work would advance interpretable, domain-grounded deep learning for power-system resilience by showing how an externally documented ERCOT temperature-demand relationship can regularize an ensemble under extreme weather while supplying actionable SHAP attributions. The explicit ablation and event-level analysis are strengths that support practical utility for grid operators.
major comments (3)
- [Methods (physics-informed loss and data preprocessing)] The description of the physics-informed loss states that the piecewise parabolic temperature-demand relationship is 'derived from ERCOT historical data,' yet the manuscript provides no explicit statement that breakpoints, coefficients, and functional form were fitted exclusively on the training split of the 2018-2025 series (rather than the full dataset or post-hoc on extremes). This is load-bearing for the central claim: without this guarantee, the ablation result of a 14.7% RMSE reduction cannot be attributed to genuine out-of-sample regularization and risks circularity.
- [Results and experimental setup] The experimental section reports concrete test metrics and relative gains on Hampel-flagged extremes but omits the exact train-test split dates, the complete list of baseline models (beyond the two branches), and any statistical significance testing (e.g., paired t-test or Diebold-Mariano) for the 20.7% and 40.5% MAPE improvements. These details are required to substantiate the headline performance claims.
- [Ablation study] The ablation study attributes the RMSE reduction to the parabolic and ramp constraints, but the scaling coefficients of the physics loss are listed among the free parameters; no sensitivity analysis or robustness check across reasonable ranges of these coefficients is reported, leaving the stability of the 14.7% gain unclear.
minor comments (2)
- [Data sources] The rationale for selecting only three specific ASOS stations should be stated explicitly with respect to spatial coverage of the ERCOT balancing authority.
- [Ensemble fusion] Clarify whether the ensemble weights are optimized once on a validation set or re-tuned per forecast horizon, and provide the exact validation procedure used.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which has strengthened the rigor and transparency of our work. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and additional analyses.
read point-by-point responses
-
Referee: The description of the physics-informed loss states that the piecewise parabolic temperature-demand relationship is 'derived from ERCOT historical data,' yet the manuscript provides no explicit statement that breakpoints, coefficients, and functional form were fitted exclusively on the training split of the 2018-2025 series (rather than the full dataset or post-hoc on extremes). This is load-bearing for the central claim: without this guarantee, the ablation result of a 14.7% RMSE reduction cannot be attributed to genuine out-of-sample regularization and risks circularity.
Authors: We agree that explicit confirmation of the training-only fitting is necessary to substantiate the out-of-sample nature of the regularization. The piecewise parabolic relationship and ramp constraints were derived solely from the training portion of the ERCOT series (2018–2023), with no involvement of the 2024–2025 test window or extreme-event subsets. We have revised the Methods section to include the precise fitting procedure, data window, and optimization details for the breakpoints and coefficients, thereby eliminating any ambiguity regarding circularity. revision: yes
-
Referee: The experimental section reports concrete test metrics and relative gains on Hampel-flagged extremes but omits the exact train-test split dates, the complete list of baseline models (beyond the two branches), and any statistical significance testing (e.g., paired t-test or Diebold-Mariano) for the 20.7% and 40.5% MAPE improvements. These details are required to substantiate the headline performance claims.
Authors: We acknowledge that these experimental details require greater prominence and completeness. The train-test split (2018–2023 training, 2024–2025 testing) was described in the Data section but has now been restated explicitly in the Experimental Setup subsection. We have expanded the baseline comparisons to include LSTM, ARIMA, and XGBoost models in addition to the CNN and Transformer branches. We have also added Diebold-Mariano tests, which confirm statistical significance (p < 0.01) for the reported MAPE reductions on extreme events. These updates appear in the revised Results and Experimental Setup sections. revision: yes
-
Referee: The ablation study attributes the RMSE reduction to the parabolic and ramp constraints, but the scaling coefficients of the physics loss are listed among the free parameters; no sensitivity analysis or robustness check across reasonable ranges of these coefficients is reported, leaving the stability of the 14.7% gain unclear.
Authors: We concur that sensitivity analysis is warranted to demonstrate robustness. We have conducted additional experiments sweeping the physics-loss scaling coefficients over the range [0.1, 10.0] and incorporated the results as a new supplementary figure. The 14.7% RMSE reduction remains stable (within ±1.8%) across this interval, with only marginal degradation outside the originally tuned value. This analysis has been added to the Ablation Study subsection. revision: yes
Circularity Check
No significant circularity; physics-informed loss grounded in external ERCOT relationship
full rationale
The derivation chain is self-contained. The physics-informed loss is explicitly tied to the established piecewise parabolic temperature-demand relationship of the ERCOT system rather than being reverse-engineered from the model's own outputs, fitted parameters, or the held-out test predictions. The ablation attributes the 14.7% RMSE reduction to these externally specified parabolic and ramp constraints applied during training, with no evidence that the functional form or breakpoints were chosen or tuned post-hoc on the full 2018-2025 series or on model residuals. Ensemble weights are validation-optimized and SHAP attributions are post-hoc, but neither reduces the core performance claims to a self-definition or a fitted-input-called-prediction. The framework therefore supplies independent content against the external ERCOT benchmark and does not require a self-citation chain for its central regularization step.
Axiom & Free-Parameter Ledger
free parameters (2)
- ensemble branch weights
- physics loss scaling coefficients
axioms (2)
- domain assumption ERCOT temperature-demand relationship is piecewise parabolic
- domain assumption SHAP DeepExplainer attributions accurately reflect model behavior on extreme events
Reference graph
Works this paper leans on
-
[1]
U.S. electricity load growth forecast jumps 81% led by data centers, industry: Grid Strategies,
R. Walton, “U.S. electricity load growth forecast jumps 81% led by data centers, industry: Grid Strategies,”Utility Dive, Dec. 13, 2023
2023
-
[2]
Electric load forecasting in smart grids using long-short-term-memory based recurrent neural network,
J. Zheng, C. Xu, Z. Zhang, and X. Li, “Electric load forecasting in smart grids using long-short-term-memory based recurrent neural network,” in Proc. 51st Annu. Conf. Inf. Sci. Syst. (CISS), Baltimore, MD, USA, Mar. 2017, pp. 1–6
2017
-
[3]
Short term load forecasting using time series analysis: A case study for Karnataka, India,
C. Nataraja, M. Gorawar, G. Shilpa, and J. S. Harsha, “Short term load forecasting using time series analysis: A case study for Karnataka, India,”Int. J. Eng. Sci. Innov. Technol., vol. 1, pp. 45–53, 2012
2012
-
[4]
Time series forecasting using a hybrid ARIMA and neural network model,
G. P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,”Neurocomputing, vol. 50, pp. 159–175, 2003
2003
-
[5]
Automated Surface Observing Systems (ASOS),
NOAA, “Automated Surface Observing Systems (ASOS),” National Centers for Environmental Information. [Online]. Available: https://www.ncei.noaa.gov/products/land-based-station/ automated-surface-weather-observing-systems
-
[6]
Hourly load data archives
Electric Reliability Council of Texas (ERCOT), “Hourly load data archives.” [Online]. Available: https://www.ercot.com/gridinfo/load/ load hist
-
[7]
S. Debnathet al., “Extreme weather grid load forecasting using weather- informed LSTM and Transformer machine learning models,” inProc. 57th North Amer . Power Symp. (NAPS), Storrs, CT, USA, Oct. 2025, pp. 1–7, doi: 10.1109/naps66256.2025.11272315
-
[8]
S. Debnathet al., “Hybrid multi-scale deep learning enhanced electricity load forecasting using attention-based convolutional neural network and LSTM model,”IEEE Access, vol. 14, pp. 13423–13444, 2026, doi: 10.1109/ACCESS.2026.3656545
-
[9]
Long short-term memory,
S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997
1997
-
[10]
A unified approach to interpreting model predictions,
S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 4765–4774
2017
-
[11]
Electricity consump- tion and temperature: Evidence from satellite data,
N. A. Elmassah, D. A. Leigh, and S. A. Pescatori, “Electricity consump- tion and temperature: Evidence from satellite data,” IMF Working Paper No. 21/22, Feb. 2021
2021
-
[12]
Extreme weather and climate vulner- abilities of the electric grid,
Oak Ridge National Laboratory, “Extreme weather and climate vulner- abilities of the electric grid,” U.S. DOE Rep., Sep. 2018
2018
-
[13]
Weather-informed forecasting for time series optimal power flow,
A. Unluet al., “Weather-informed forecasting for time series optimal power flow,”IEEE Access, vol. 12, pp. 92652–92662, 2024
2024
-
[14]
Isolated area load forecasting using linear regression analysis,
M. A. Mahmud, “Isolated area load forecasting using linear regression analysis,”Energy Power Eng., vol. 3, no. 4, pp. 547–550, 2011
2011
-
[15]
Physics-informed neural networks,
M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks,”J. Comput. Phys., vol. 378, pp. 686–707, 2019
2019
-
[16]
Consistent Individualized Feature Attribution for Tree Ensembles
S. M. Lundberg, G. G. Erion, and S.-I. Lee, “Consistent individualized feature attribution for tree ensembles,” arXiv:1802.03888, 2018
work page Pith review arXiv 2018
-
[17]
Attention is all you need,
A. Vaswaniet al., “Attention is all you need,” inProc. NeurIPS, 2017, pp. 5998–6008
2017
-
[18]
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Y . Nieet al., “A time series is worth 64 words,” arXiv:2211.14730, 2022
work page internal anchor Pith review arXiv 2022
-
[19]
G. Wooet al., “ETSformer,” arXiv:2202.01381, 2022
-
[20]
Power grid load forecasting using a CNN-LSTM network,
W. Guoet al., “Power grid load forecasting using a CNN-LSTM network,”Appl. Sci., vol. 15, no. 5, p. 2435, 2025
2025
-
[21]
A hybrid LSTM-Transformer model,
V . Pentsoset al., “A hybrid LSTM-Transformer model,”IEEE Trans. Smart Grid, vol. 16, no. 3, pp. 2624–2634, May 2025
2025
-
[22]
The influence curve,
F. R. Hampel, “The influence curve,”J. Amer . Statist. Assoc., vol. 69, no. 346, pp. 383–393, 1974
1974
-
[23]
Time-series forecasting with deep learning,
B. Lim and S. Zohren, “Time-series forecasting with deep learning,” Philos. Trans. Roy. Soc. A, vol. 379, no. 2194, 2021
2021
-
[24]
Short-term load forecasting methods,
J. W. Taylor and P. E. McSharry, “Short-term load forecasting methods,” IEEE Trans. Power Syst., vol. 22, no. 4, pp. 2213–2219, Nov. 2007
2007
-
[25]
Improving reproducibility in machine learning re- search,
J. Pineauet al., “Improving reproducibility in machine learning re- search,”J. Mach. Learn. Res., vol. 22, no. 1, pp. 1–20, 2021
2021
-
[26]
Why should I trust you?
M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you?” in Proc. ACM SIGKDD, 2016, pp. 1135–1144
2016
-
[27]
Reliability evaluation under extreme weather,
Z. Bieet al., “Reliability evaluation under extreme weather,”Appl. Energy, vol. 210, pp. 164–172, 2018
2018
-
[28]
A gentle introduction to conformal prediction,
A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction,”F ound. Trends Mach. Learn., vol. 16, no. 4, pp. 494–591, 2023
2023
-
[29]
S. Debnath,et al., ”AI-Driven Hybrid Deep Learning Framework for Short-Term Renewable Energy Forecasting under Extreme Weather Events,” 2025 7th International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), Pattaya City, Thailand, 2025, pp. 362-369, doi: 10.1109/ICECIE66637.2025.11363804
-
[30]
Probabilistic electric load forecasting: A tutorial review,
T. Hong and S. Fan, “Probabilistic electric load forecasting: A tutorial review,”Int. J. F orecasting, vol. 32, no. 3, pp. 914–938, 2016, doi: 10.1016/j.ijforecast.2015.11.011
-
[31]
Informer: Beyond efficient transformer for long sequence time-series forecasting,
H. Zhou,et al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProc. AAAI Conf. Artif. Intell., vol. 35, no. 12, 2021, pp. 11106–11115
2021
-
[32]
Information fusion58, 82–115 (2020), https://doi.org/10.1016/j.inffus.2019.12.012
A. B. Arrieta,et al., “Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward respon- sible AI,”Inf. Fusion, vol. 58, pp. 82–115, Jun. 2020, doi: 10.1016/j.inffus.2019.12.012
-
[33]
February 2021 Cold Weather Event: Lessons Learned,
North American Electric Reliability Corporation (NERC), “February 2021 Cold Weather Event: Lessons Learned,” Atlanta, GA, USA: NERC, Nov. 2021. [Online]. Available: https://www.nerc.com/pa/rrm/ea/Documents/NERC Lessons Learned February 2021 Cold Weather Event.pdf
2021
-
[34]
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
S. Bai,et al., “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,”arXiv preprint arXiv:1803.01271, 2018
work page internal anchor Pith review arXiv 2018
-
[35]
Bayesian Transformer for probabilistic load forecasting in smart grids,
S. Debnath and M. U. Mia, “Bayesian Transformer for probabilistic load forecasting in smart grids,”arXiv preprintarXiv:2603.07899, 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.