arxiv: 2604.23500 · v1 · submitted 2026-04-26 · 💻 cs.LG · cs.AI

Recognition: unknown

Interpretable Physics-Informed Load Forecasting for U.S. Grid Resilience: SHAP-Guided Ensemble Validation in Hybrid Deep Learning Under Extreme Weather

Md Abubakkar , Sajib Debnath , Md. Uzzal Mia

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords load forecastingphysics-informed neural networksSHAPextreme weatherERCOThybrid deep learninggrid resilienceinterpretability

0 comments

The pith

A hybrid CNN-Transformer ensemble regularized by ERCOT physics constraints improves electricity load forecasts during extreme weather.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an interpretable physics-informed ensemble framework that combines a CNN branch for local feature extraction with a Transformer branch for long-range dependencies in short-term electricity load forecasting. The branches are fused through a validation-optimized weighted ensemble and regularized using a physics-informed loss based on the piecewise parabolic temperature-demand relationship from ERCOT data. This setup is tested on eight years of hourly ERCOT load data fused with weather records, yielding strong overall accuracy and notable improvements on extreme events flagged by Hampel method. SHAP analysis provides interpretability by showing how feature importance shifts between normal and extreme weather regimes. A sympathetic reader would care because opaque deep learning models limit trust in critical grid operations during weather extremes that threaten U.S. resilience.

Core claim

The unified framework integrates CNN and Transformer branches fused by a weighted ensemble, regularized by physics-informed loss from the piecewise parabolic temperature-demand relationship, and interpreted via SHAP. On the ERCOT test data it reaches 713 MW MAE, 812 MW RMSE and 1.18% MAPE, with MAPE dropping 20.7% versus the Transformer and 40.5% versus the CNN on extreme events; the parabolic and ramp constraints account for a 14.7% RMSE reduction in ablation, while SHAP identifies a regime shift where temperature dominates normally but wind speed and precipitation gain influence during cold fronts and heatwaves.

What carries the argument

The physics-informed loss derived from the piecewise parabolic temperature-demand relationship of the ERCOT system, which embeds domain knowledge into the hybrid deep learning training to improve accuracy and interpretability.

If this is right

The ensemble provides more accurate forecasts for grid operators managing reliability under extreme weather.
SHAP attributions highlight shifting influences of weather variables, enabling targeted model refinements.
Ablation studies confirm that the physics constraints contribute substantially to the observed error reductions.
The identified regime shift suggests that forecasting models may benefit from condition-specific adaptations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This physics-informed approach might extend to load forecasting in other regional grids if analogous temperature-demand relationships can be derived from their historical data.
The SHAP-guided insights could inform the development of adaptive ensemble weights that change based on detected weather regimes.
Future work could test the framework's robustness by applying it to projected climate data with intensified extreme weather events.

Load-bearing premise

The piecewise parabolic temperature-demand relationship derived from ERCOT historical data continues to hold without material deviation in the held-out test window and during the specific extreme weather events examined.

What would settle it

Applying the same framework to load and weather data from a different U.S. grid operator where the temperature-demand curve deviates from the parabolic form and checking whether MAPE on extreme events still improves over the individual branches.

Figures

Figures reproduced from arXiv: 2604.23500 by Md Abubakkar, Md. Uzzal Mia, Sajib Debnath.

**Figure 1.** Figure 1: Overall architecture of the proposed physics-informed interpretable ensemble framework. view at source ↗

**Figure 2.** Figure 2: Workflow diagram from data ingestion to operator-facing attribution-enriched forecast. view at source ↗

**Figure 3.** Figure 3: ERCOT demand characterization (2018–2025): (a) piecewise parabolic temperature–demand relationship with view at source ↗

**Figure 4.** Figure 4: Error diagnostics on the 2024–2025 test window: ( view at source ↗

**Figure 5.** Figure 5: Observed vs. predicted ERCOT demand across the full test year and during representative extreme events. view at source ↗

**Figure 6.** Figure 6: Physics-informed vs. unconstrained ensemble during the December 22, 2024 cold snap view at source ↗

**Figure 7.** Figure 7: Global SHAP feature importance for the proposed physics-informed view at source ↗

read the original abstract

Accurate short-term electricity load forecasting is a cornerstone of U.S. grid reliability; however, prevailing deep learning models remain opaque, limiting operator trust during extreme weather. A unified, interpretable, physics-informed ensemble framework is proposed, integrating a Convolutional Neural Network (CNN) branch for local feature extraction and a Transformer branch for long-range dependency modeling; the branches are fused through a validation-optimized weighted ensemble and regularized by a physics-informed loss derived from the piecewise parabolic temperature-demand relationship of the Electric Reliability Council of Texas (ERCOT) system. Post-hoc interpretability is provided through SHapley Additive exPlanations (SHAP) with the DeepExplainer backend, yielding global and event-level attributions. Using eight years of ERCOT hourly load data (2018-2025) fused with Automated Surface Observing System (ASOS) records from three Texas stations, the framework achieves 713 MW MAE, 812 MW RMSE, and 1.18% MAPE on the test window. For Hampel-flagged extreme events, MAPE falls by 20.7% relative to its Transformer branch and by 40.5% relative to its CNN branch; an ablation confirms that the parabolic and ramp constraints drive a 14.7% RMSE reduction. SHAP analysis reveals a regime shift: temperature dominates under normal operation, whereas wind speed and precipitation become more influential during cold fronts and heatwaves.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The hybrid CNN-Transformer with ERCOT physics loss shows concrete gains on extreme events, but the out-of-sample status of the physics term needs explicit confirmation before the ablation results can be taken at face value.

read the letter

The main takeaway is that this paper presents a hybrid deep learning setup for ERCOT load forecasting that combines CNN and Transformer branches, adds a physics-informed loss from the known temperature-demand curve, and uses SHAP to explain shifts during extremes. The reported gains on Hampel-flagged events look useful for grid resilience work. What the paper does well is deliver specific performance numbers along with an ablation that isolates the contribution of the parabolic and ramp constraints. The focus on interpretability and regime-specific feature importance is a plus for building operator trust. Fusing load data with local weather station records is a reasonable choice for the application. The soft spots center on the physics loss validation. The abstract says the relationship is derived from ERCOT historical data, yet gives no indication that the fit was performed exclusively on the training split. If the full 2018-2025 series informed the breakpoints, the ablation showing a 14.7 percent RMSE drop and the larger relative improvements on extremes may overstate the out-of-sample benefit. This matches the stress-test note and needs explicit confirmation in the methods. Additional gaps include the lack of reported statistical tests for the differences and limited information on hyperparameter sensitivity. This work is aimed at applied machine learning researchers and practitioners in energy systems who are building forecasting tools for regions with increasing weather extremes. It has enough structure and concrete claims to deserve a serious referee, even if revisions will be required to address the validation details. I would send it to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a hybrid physics-informed ensemble for short-term ERCOT load forecasting that fuses a CNN branch (local features) and Transformer branch (long-range dependencies) via validation-tuned weights, regularized by a loss term encoding the piecewise parabolic temperature-demand relationship derived from ERCOT data, and augmented with SHAP (DeepExplainer) post-hoc explanations. On eight years of hourly ERCOT load (2018-2025) plus ASOS weather from three Texas stations, it reports test metrics of 713 MW MAE, 812 MW RMSE and 1.18% MAPE, with 20.7%/40.5% MAPE reductions on Hampel-flagged extremes relative to the individual branches and a 14.7% RMSE drop in ablation when the parabolic/ramp constraints are included; SHAP reveals temperature dominance in normal regimes shifting to wind/precipitation in extremes.

Significance. If the out-of-sample validity of the physics constraints is confirmed and the performance gains hold under rigorous validation, the work would advance interpretable, domain-grounded deep learning for power-system resilience by showing how an externally documented ERCOT temperature-demand relationship can regularize an ensemble under extreme weather while supplying actionable SHAP attributions. The explicit ablation and event-level analysis are strengths that support practical utility for grid operators.

major comments (3)

[Methods (physics-informed loss and data preprocessing)] The description of the physics-informed loss states that the piecewise parabolic temperature-demand relationship is 'derived from ERCOT historical data,' yet the manuscript provides no explicit statement that breakpoints, coefficients, and functional form were fitted exclusively on the training split of the 2018-2025 series (rather than the full dataset or post-hoc on extremes). This is load-bearing for the central claim: without this guarantee, the ablation result of a 14.7% RMSE reduction cannot be attributed to genuine out-of-sample regularization and risks circularity.
[Results and experimental setup] The experimental section reports concrete test metrics and relative gains on Hampel-flagged extremes but omits the exact train-test split dates, the complete list of baseline models (beyond the two branches), and any statistical significance testing (e.g., paired t-test or Diebold-Mariano) for the 20.7% and 40.5% MAPE improvements. These details are required to substantiate the headline performance claims.
[Ablation study] The ablation study attributes the RMSE reduction to the parabolic and ramp constraints, but the scaling coefficients of the physics loss are listed among the free parameters; no sensitivity analysis or robustness check across reasonable ranges of these coefficients is reported, leaving the stability of the 14.7% gain unclear.

minor comments (2)

[Data sources] The rationale for selecting only three specific ASOS stations should be stated explicitly with respect to spatial coverage of the ERCOT balancing authority.
[Ensemble fusion] Clarify whether the ensemble weights are optimized once on a validation set or re-tuned per forecast horizon, and provide the exact validation procedure used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which has strengthened the rigor and transparency of our work. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and additional analyses.

read point-by-point responses

Referee: The description of the physics-informed loss states that the piecewise parabolic temperature-demand relationship is 'derived from ERCOT historical data,' yet the manuscript provides no explicit statement that breakpoints, coefficients, and functional form were fitted exclusively on the training split of the 2018-2025 series (rather than the full dataset or post-hoc on extremes). This is load-bearing for the central claim: without this guarantee, the ablation result of a 14.7% RMSE reduction cannot be attributed to genuine out-of-sample regularization and risks circularity.

Authors: We agree that explicit confirmation of the training-only fitting is necessary to substantiate the out-of-sample nature of the regularization. The piecewise parabolic relationship and ramp constraints were derived solely from the training portion of the ERCOT series (2018–2023), with no involvement of the 2024–2025 test window or extreme-event subsets. We have revised the Methods section to include the precise fitting procedure, data window, and optimization details for the breakpoints and coefficients, thereby eliminating any ambiguity regarding circularity. revision: yes
Referee: The experimental section reports concrete test metrics and relative gains on Hampel-flagged extremes but omits the exact train-test split dates, the complete list of baseline models (beyond the two branches), and any statistical significance testing (e.g., paired t-test or Diebold-Mariano) for the 20.7% and 40.5% MAPE improvements. These details are required to substantiate the headline performance claims.

Authors: We acknowledge that these experimental details require greater prominence and completeness. The train-test split (2018–2023 training, 2024–2025 testing) was described in the Data section but has now been restated explicitly in the Experimental Setup subsection. We have expanded the baseline comparisons to include LSTM, ARIMA, and XGBoost models in addition to the CNN and Transformer branches. We have also added Diebold-Mariano tests, which confirm statistical significance (p < 0.01) for the reported MAPE reductions on extreme events. These updates appear in the revised Results and Experimental Setup sections. revision: yes
Referee: The ablation study attributes the RMSE reduction to the parabolic and ramp constraints, but the scaling coefficients of the physics loss are listed among the free parameters; no sensitivity analysis or robustness check across reasonable ranges of these coefficients is reported, leaving the stability of the 14.7% gain unclear.

Authors: We concur that sensitivity analysis is warranted to demonstrate robustness. We have conducted additional experiments sweeping the physics-loss scaling coefficients over the range [0.1, 10.0] and incorporated the results as a new supplementary figure. The 14.7% RMSE reduction remains stable (within ±1.8%) across this interval, with only marginal degradation outside the originally tuned value. This analysis has been added to the Ablation Study subsection. revision: yes

Circularity Check

0 steps flagged

No significant circularity; physics-informed loss grounded in external ERCOT relationship

full rationale

The derivation chain is self-contained. The physics-informed loss is explicitly tied to the established piecewise parabolic temperature-demand relationship of the ERCOT system rather than being reverse-engineered from the model's own outputs, fitted parameters, or the held-out test predictions. The ablation attributes the 14.7% RMSE reduction to these externally specified parabolic and ramp constraints applied during training, with no evidence that the functional form or breakpoints were chosen or tuned post-hoc on the full 2018-2025 series or on model residuals. Ensemble weights are validation-optimized and SHAP attributions are post-hoc, but neither reduces the core performance claims to a self-definition or a fitted-input-called-prediction. The framework therefore supplies independent content against the external ERCOT benchmark and does not require a self-citation chain for its central regularization step.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard neural network building blocks, a domain-specific temperature-demand relationship previously observed in ERCOT data, and post-hoc attribution methods whose reliability is assumed rather than proven within the work.

free parameters (2)

ensemble branch weights
Validation-optimized weights that combine CNN and Transformer outputs
physics loss scaling coefficients
Regularization strengths for the parabolic and ramp constraint terms

axioms (2)

domain assumption ERCOT temperature-demand relationship is piecewise parabolic
Invoked to construct the physics-informed loss term
domain assumption SHAP DeepExplainer attributions accurately reflect model behavior on extreme events
Used to support claims about regime shifts in feature importance

pith-pipeline@v0.9.0 · 5576 in / 1476 out tokens · 55948 ms · 2026-05-08T06:29:37.736897+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 10 canonical work pages · 2 internal anchors

[1]

U.S. electricity load growth forecast jumps 81% led by data centers, industry: Grid Strategies,

R. Walton, “U.S. electricity load growth forecast jumps 81% led by data centers, industry: Grid Strategies,”Utility Dive, Dec. 13, 2023

2023
[2]

Electric load forecasting in smart grids using long-short-term-memory based recurrent neural network,

J. Zheng, C. Xu, Z. Zhang, and X. Li, “Electric load forecasting in smart grids using long-short-term-memory based recurrent neural network,” in Proc. 51st Annu. Conf. Inf. Sci. Syst. (CISS), Baltimore, MD, USA, Mar. 2017, pp. 1–6

2017
[3]

Short term load forecasting using time series analysis: A case study for Karnataka, India,

C. Nataraja, M. Gorawar, G. Shilpa, and J. S. Harsha, “Short term load forecasting using time series analysis: A case study for Karnataka, India,”Int. J. Eng. Sci. Innov. Technol., vol. 1, pp. 45–53, 2012

2012
[4]

Time series forecasting using a hybrid ARIMA and neural network model,

G. P. Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,”Neurocomputing, vol. 50, pp. 159–175, 2003

2003
[5]

Automated Surface Observing Systems (ASOS),

NOAA, “Automated Surface Observing Systems (ASOS),” National Centers for Environmental Information. [Online]. Available: https://www.ncei.noaa.gov/products/land-based-station/ automated-surface-weather-observing-systems
[6]

Hourly load data archives

Electric Reliability Council of Texas (ERCOT), “Hourly load data archives.” [Online]. Available: https://www.ercot.com/gridinfo/load/ load hist
[7]

Extreme weather grid load forecasting using weather- informed LSTM and Transformer machine learning models,

S. Debnathet al., “Extreme weather grid load forecasting using weather- informed LSTM and Transformer machine learning models,” inProc. 57th North Amer . Power Symp. (NAPS), Storrs, CT, USA, Oct. 2025, pp. 1–7, doi: 10.1109/naps66256.2025.11272315

work page doi:10.1109/naps66256.2025.11272315 2025
[8]

Hybrid multi-scale deep learning enhanced electricity load forecasting using attention-based convolutional neural network and LSTM model,

S. Debnathet al., “Hybrid multi-scale deep learning enhanced electricity load forecasting using attention-based convolutional neural network and LSTM model,”IEEE Access, vol. 14, pp. 13423–13444, 2026, doi: 10.1109/ACCESS.2026.3656545

work page doi:10.1109/access.2026.3656545 2026
[9]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997

1997
[10]

A unified approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 4765–4774

2017
[11]

Electricity consump- tion and temperature: Evidence from satellite data,

N. A. Elmassah, D. A. Leigh, and S. A. Pescatori, “Electricity consump- tion and temperature: Evidence from satellite data,” IMF Working Paper No. 21/22, Feb. 2021

2021
[12]

Extreme weather and climate vulner- abilities of the electric grid,

Oak Ridge National Laboratory, “Extreme weather and climate vulner- abilities of the electric grid,” U.S. DOE Rep., Sep. 2018

2018
[13]

Weather-informed forecasting for time series optimal power flow,

A. Unluet al., “Weather-informed forecasting for time series optimal power flow,”IEEE Access, vol. 12, pp. 92652–92662, 2024

2024
[14]

Isolated area load forecasting using linear regression analysis,

M. A. Mahmud, “Isolated area load forecasting using linear regression analysis,”Energy Power Eng., vol. 3, no. 4, pp. 547–550, 2011

2011
[15]

Physics-informed neural networks,

M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks,”J. Comput. Phys., vol. 378, pp. 686–707, 2019

2019
[16]

Consistent Individualized Feature Attribution for Tree Ensembles

S. M. Lundberg, G. G. Erion, and S.-I. Lee, “Consistent individualized feature attribution for tree ensembles,” arXiv:1802.03888, 2018

work page Pith review arXiv 2018
[17]

Attention is all you need,

A. Vaswaniet al., “Attention is all you need,” inProc. NeurIPS, 2017, pp. 5998–6008

2017
[18]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Y . Nieet al., “A time series is worth 64 words,” arXiv:2211.14730, 2022

work page internal anchor Pith review arXiv 2022
[19]

Etsformer: Exponential smoothing transformers for time-series forecasting.arXiv preprint arXiv:2202.01381, 2022

G. Wooet al., “ETSformer,” arXiv:2202.01381, 2022

work page arXiv 2022
[20]

Power grid load forecasting using a CNN-LSTM network,

W. Guoet al., “Power grid load forecasting using a CNN-LSTM network,”Appl. Sci., vol. 15, no. 5, p. 2435, 2025

2025
[21]

A hybrid LSTM-Transformer model,

V . Pentsoset al., “A hybrid LSTM-Transformer model,”IEEE Trans. Smart Grid, vol. 16, no. 3, pp. 2624–2634, May 2025

2025
[22]

The influence curve,

F. R. Hampel, “The influence curve,”J. Amer . Statist. Assoc., vol. 69, no. 346, pp. 383–393, 1974

1974
[23]

Time-series forecasting with deep learning,

B. Lim and S. Zohren, “Time-series forecasting with deep learning,” Philos. Trans. Roy. Soc. A, vol. 379, no. 2194, 2021

2021
[24]

Short-term load forecasting methods,

J. W. Taylor and P. E. McSharry, “Short-term load forecasting methods,” IEEE Trans. Power Syst., vol. 22, no. 4, pp. 2213–2219, Nov. 2007

2007
[25]

Improving reproducibility in machine learning re- search,

J. Pineauet al., “Improving reproducibility in machine learning re- search,”J. Mach. Learn. Res., vol. 22, no. 1, pp. 1–20, 2021

2021
[26]

Why should I trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you?” in Proc. ACM SIGKDD, 2016, pp. 1135–1144

2016
[27]

Reliability evaluation under extreme weather,

Z. Bieet al., “Reliability evaluation under extreme weather,”Appl. Energy, vol. 210, pp. 164–172, 2018

2018
[28]

A gentle introduction to conformal prediction,

A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction,”F ound. Trends Mach. Learn., vol. 16, no. 4, pp. 494–591, 2023

2023
[29]

S. Debnath,et al., ”AI-Driven Hybrid Deep Learning Framework for Short-Term Renewable Energy Forecasting under Extreme Weather Events,” 2025 7th International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), Pattaya City, Thailand, 2025, pp. 362-369, doi: 10.1109/ICECIE66637.2025.11363804

work page doi:10.1109/icecie66637.2025.11363804 2025
[30]

Probabilistic electric load forecasting: A tutorial review,

T. Hong and S. Fan, “Probabilistic electric load forecasting: A tutorial review,”Int. J. F orecasting, vol. 32, no. 3, pp. 914–938, 2016, doi: 10.1016/j.ijforecast.2015.11.011

work page doi:10.1016/j.ijforecast.2015.11.011 2016
[31]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhou,et al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProc. AAAI Conf. Artif. Intell., vol. 35, no. 12, 2021, pp. 11106–11115

2021
[32]

Information fusion58, 82–115 (2020), https://doi.org/10.1016/j.inffus.2019.12.012

A. B. Arrieta,et al., “Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward respon- sible AI,”Inf. Fusion, vol. 58, pp. 82–115, Jun. 2020, doi: 10.1016/j.inffus.2019.12.012

work page doi:10.1016/j.inffus.2019.12.012 2020
[33]

February 2021 Cold Weather Event: Lessons Learned,

North American Electric Reliability Corporation (NERC), “February 2021 Cold Weather Event: Lessons Learned,” Atlanta, GA, USA: NERC, Nov. 2021. [Online]. Available: https://www.nerc.com/pa/rrm/ea/Documents/NERC Lessons Learned February 2021 Cold Weather Event.pdf

2021
[34]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

S. Bai,et al., “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,”arXiv preprint arXiv:1803.01271, 2018

work page internal anchor Pith review arXiv 2018
[35]

Bayesian Transformer for probabilistic load forecasting in smart grids,

S. Debnath and M. U. Mia, “Bayesian Transformer for probabilistic load forecasting in smart grids,”arXiv preprintarXiv:2603.07899, 2026

work page arXiv 2026