pith. machine review for the scientific record. sign in

arxiv: 2604.23908 · v1 · submitted 2026-04-26 · 💻 cs.LG · cs.SY· eess.SY

Recognition: unknown

Machine Learning and Deep Learning Models for Short Term Electricity Price Forecasting in Australia's National Electricity Market

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:19 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY
keywords electricity price forecastingmachine learninggradient boostingLSTMNational Electricity Marketprice volatilityshort-term forecastingSouth Australia
0
0 comments X

The pith

Tree-based models outperform LSTM and SVR for electricity price forecasting in South Australia's volatile market, though all models exceed 90 percent mean absolute percentage error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets up a consistent benchmark to compare six machine learning algorithms on short-term electricity price and demand data from the South Australian region of Australia's National Electricity Market. It demonstrates that tree-based approaches, especially gradient boosted regression trees, deliver higher accuracy than long short-term memory networks or support vector regression for prices, with an R-squared reaching 0.88, yet every method produces mean absolute percentage errors above 90 percent and most GBRT predictions deviate by more than 10 percent. The same features and split yield much stronger results for demand forecasting. The comparison matters because competitive power markets rely on reliable forecasts amid high renewable penetration that creates frequent price spikes and negative intervals. The study isolates algorithm differences through identical preprocessing and a chronological train-test split.

Core claim

Under a unified benchmark with identical lag features, rolling statistics, cyclic temporal encodings, and an 85/15 chronological split, tree-based models including GBRT outperform LSTM and SVR on price prediction with R-squared up to 0.88, while all models show mean absolute percentage error above 90 percent and over 65 percent of GBRT predictions carry relative errors above 10 percent; demand prediction reaches R-squared of 0.96 and mean absolute percentage error below 32 percent for AWMLSTM and GBRT, with 74.37 percent of GBRT samples inside 5 percent error.

What carries the argument

The unified benchmark framework that applies the same data preprocessing, feature engineering with lag features, rolling statistics, cyclic temporal encodings, and an 85/15 chronological train-test split across AWMLSTM, CatBoost, GBRT, LSTM, LightGBM, and SVR.

If this is right

  • Tree-based models should be prioritized over LSTM and SVR for price forecasting in similar high-volatility electricity markets.
  • Hybrid models combining trees with transformers may improve capture of extreme price events.
  • Data augmentation for spikes and post-prediction error correction techniques could reduce the observed high relative errors.
  • Demand forecasting benefits substantially more from the same features and models, achieving lower errors across the board.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The persistent high errors suggest that external signals such as real-time renewable generation or weather data may be needed beyond historical lags to explain remaining volatility.
  • Applying the same benchmark to other NEM regions with varying renewable shares would reveal how well the tree-model advantage generalizes.
  • Custom loss functions that penalize negative prices and spikes differently during training could address the imbalance the current setup leaves untouched.

Load-bearing premise

The chosen lag features, rolling statistics, cyclic encodings, and chronological split adequately handle non-stationarity, negative prices, and structural changes such as the shift to five-minute settlement without bias or missing future regime shifts.

What would settle it

Retraining and testing the same models on data split around the five-minute settlement transition date to check whether the performance ranking of tree-based models over LSTM and SVR remains stable or reverses.

read the original abstract

Short term electricity price forecast is essential in competitive power markets, yet electricity price series exhibit high volatility, irregularity, and non-stationarity. This phenomenon is pronounced in the South Australian region of the National Electricity Market, where high renewable penetration drives price volatility and frequent negative price intervals, while structural changes such as the transition to five-minute settlement further complicate forecast. To address these challenges, this study develops a unified benchmark framework. Under identical data preprocessing, feature engineering with lag features, rolling statistics, cyclic temporal encodings, and so on, and an 85% to 15% chronological train test split, six algorithms are systematically compared, including AWMLSTM, CatBoost, GBRT, LSTM, LightGBM, and SVR. The results show that for price prediction, tree-based models, especially GBRT with an R squared value of 0.88, generally outperform LSTM and SVR. However, all models achieve a mean absolute percentage error above 90%, and more than 65% of GBRT predictions have relative errors above 10%, which highlights the inherent difficulty of price forecast. For demand prediction, all models perform substantially better than in price prediction. AWMLSTM and GBRT achieve an R2 value of 0.96 with mean absolute percentage error below 32%, and GBRT has 74.37% of samples within 5% error, while LSTM and SVR perform less accurately in both tasks. Future improvements should focus on hybrid models such as tree plus transformers, data augmentation for extreme events, and error correction to better capture price spikes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript develops a unified benchmark for short-term electricity price and demand forecasting in South Australia's NEM using six ML/DL models (AWMLSTM, CatBoost, GBRT, LSTM, LightGBM, SVR). With lag features, rolling statistics, cyclic encodings, and an 85/15 chronological split, it reports that tree-based models, particularly GBRT (R²=0.88 for price), outperform LSTM and SVR, but all models exhibit MAPE >90% for prices, underscoring forecasting difficulty; demand forecasting achieves higher accuracy (R²=0.96, MAPE<32%).

Significance. If the empirical comparisons hold under rigorous validation, this provides a valuable reference point for the challenges of price prediction in volatile, renewable-heavy markets with negative prices and market rule changes. The explicit reporting of poor MAPE and relative error distributions is a strength, as is the side-by-side evaluation of tree ensembles versus recurrent networks under identical preprocessing.

major comments (3)
  1. Methodology (data split and feature engineering): The single 85/15 chronological split and uniform application of lag/rolling/cyclic features across the entire series do not include regime indicators or pre/post-transition analysis for the five-minute settlement structural change highlighted in the abstract. This is load-bearing for the central claim of GBRT's R²=0.88 superiority, as tree models may overfit pre-shift patterns while post-shift volatility affects other models differently.
  2. Results and discussion: Although multiple metrics (R², MAPE, error distributions) are used, the manuscript lacks statistical significance tests (e.g., Diebold-Mariano) for model performance differences and details on hyperparameter optimization procedures, which are necessary to substantiate the outperformance claims given the high price volatility.
  3. Abstract and introduction: The handling of negative prices is mentioned as a challenge but not detailed in the preprocessing or model inputs; this could impact the MAPE calculations and relative error assessments for price forecasting.
minor comments (3)
  1. Abstract: The phrase 'and so on' in the feature description is imprecise; a complete list of engineered features should be provided for reproducibility.
  2. Results: The percentage of samples within 5% error for demand (74.37% for GBRT) is useful but should be accompanied by similar breakdowns for price predictions beyond the >10% relative error note.
  3. Conclusion: Suggestions for future work (hybrid models, data augmentation) are appropriate but could reference specific prior work on transformer-based time series or spike detection in electricity markets.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful and constructive comments, which have helped us improve the clarity and robustness of our manuscript. We address each major comment in detail below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: Methodology (data split and feature engineering): The single 85/15 chronological split and uniform application of lag/rolling/cyclic features across the entire series do not include regime indicators or pre/post-transition analysis for the five-minute settlement structural change highlighted in the abstract. This is load-bearing for the central claim of GBRT's R²=0.88 superiority, as tree models may overfit pre-shift patterns while post-shift volatility affects other models differently.

    Authors: We agree that explicitly accounting for the five-minute settlement transition is important given its mention in the abstract. The chronological 85/15 split was employed to ensure temporal causality and prevent information leakage from future to past, which is a standard practice in time-series forecasting. Nevertheless, to address the referee's concern, we have introduced a binary regime indicator feature distinguishing pre- and post-transition periods in the revised feature engineering pipeline. Furthermore, we have added a supplementary analysis splitting the test set into pre- and post-transition subsets, where GBRT continues to demonstrate superior performance relative to the other models. This supports that the reported superiority is not solely due to overfitting pre-shift patterns. revision: yes

  2. Referee: Results and discussion: Although multiple metrics (R², MAPE, error distributions) are used, the manuscript lacks statistical significance tests (e.g., Diebold-Mariano) for model performance differences and details on hyperparameter optimization procedures, which are necessary to substantiate the outperformance claims given the high price volatility.

    Authors: We concur that rigorous statistical validation is essential, particularly in the presence of high volatility. In the revised manuscript, we have detailed the hyperparameter optimization procedure, which utilized a grid search over a predefined parameter space combined with rolling time-series cross-validation on the training set to select optimal hyperparameters for each model. Additionally, we have incorporated Diebold-Mariano tests to assess the statistical significance of performance differences between GBRT and the other models. The results, now presented in a new table, indicate that the improvements are statistically significant for the majority of comparisons. These additions bolster the credibility of our outperformance claims. revision: yes

  3. Referee: Abstract and introduction: The handling of negative prices is mentioned as a challenge but not detailed in the preprocessing or model inputs; this could impact the MAPE calculations and relative error assessments for price forecasting.

    Authors: We appreciate this observation. Negative prices are a key characteristic of the South Australian market and are preserved without any shifting or absolute transformation in the target variable to maintain their economic significance. The same applies to input features derived from prices. Regarding MAPE, we employ the conventional formula based on absolute percentage errors, which remains well-defined for negative values but can indeed be sensitive to near-zero prices. We have expanded the preprocessing subsection in the revised manuscript to explicitly describe this approach and discuss its implications for interpreting the high MAPE values observed. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark with held-out chronological split shows no circularity

full rationale

The paper conducts a standard empirical comparison of six ML/DL models (GBRT, CatBoost, LightGBM, LSTM, AWMLSTM, SVR) for electricity price and demand forecasting. It applies fixed preprocessing, lag/rolling/cyclic features, and an 85/15 chronological train-test split, then reports direct test-set metrics (R²=0.88 for GBRT on price, MAPE>90% for all, etc.). No equations, derivations, fitted parameters renamed as predictions, self-citations for uniqueness theorems, or ansatzes exist; performance numbers are computed post-training on external held-out data and do not reduce to the inputs by construction. The derivation chain is self-contained experimental results.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The claims rest on standard time-series forecasting assumptions plus the specific data split and feature choices; no new entities are postulated.

free parameters (1)
  • Model hyperparameters (learning rates, tree depths, LSTM units, etc.)
    Each algorithm's parameters are tuned on the training portion to maximize reported metrics.
axioms (2)
  • domain assumption Chronological 85/15 split prevents leakage and simulates operational forecasting conditions
    Invoked when describing the train-test division; standard but assumes no unmodeled structural breaks after the split.
  • domain assumption Lag features, rolling statistics, and cyclic encodings capture the relevant temporal structure
    Stated in the feature-engineering description; the paper does not test whether these features are sufficient for the observed non-stationarity.

pith-pipeline@v0.9.0 · 9928 in / 1449 out tokens · 101188 ms · 2026-05-08T06:19:00.883488+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 1 canonical work pages

  1. [1]

    Kirschen, D.S. and G. Strbac, Fundamentals of power system economics. 2018: John Wiley & Sons

  2. [2]

    Tan, and O

    Parker, G.G., B. Tan, and O. Kazan, Electric power industry: Operational and public policy challenges and opportunities. Production and Operations Management, 2019. 28(11): p. 2738-2777

  3. [3]

    International journal of forecasting, 2014

    Weron, R., Electricity price forecasting: A review of the state -of-the-art with a look into the future. International journal of forecasting, 2014. 30(4): p. 1030-1081

  4. [4]

    IEEE Open Access Journal of Power and Energy, 2020

    Hong, T., et al., Energy forecasting: A review and outlook. IEEE Open Access Journal of Power and Energy, 2020. 7: p. 376-388

  5. [5]

    Nowotarski, J. and R. Weron, Recent advances in electricity price forecasting: A review of probabilistic forecasting. Renewable and Sustainable Energy Reviews, 2018. 81: p. 1548-1568

  6. [6]

    IEEE transactions on power systems, 2003

    Contreras, J., et al., ARIMA models to predict next -day electricity prices. IEEE transactions on power systems, 2003. 18(3): p. 1014-1020

  7. [7]

    De Ridd er, and B

    Lago, J., F. De Ridd er, and B. De Schutter, Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms. Applied Energy, 2018. 221: p. 386-405

  8. [8]

    Applied Energy, 2021

    Lago, J., et al., Forecasting day-ahead electricity prices: A review of stat e-of-the-art algorithms, best practices and an open-access benchmark. Applied Energy, 2021. 293: p. 116983

  9. [9]

    Applied Energy, 2019

    Brusaferri, A., et al., Bayesian deep learning based method for probabilistic forecast of day-ahead electricity prices. Applied Energy, 2019. 250: p. 1158-1175

  10. [10]

    Torgo, and I

    Cerqueira, V., L. Torgo, and I. Mozetič, Evaluating time series forecasting models: An empirical study on performance estimation methods. Machine Learning, 2020. 109(11): p. 1997-2028

  11. [11]

    Rai, A. and O. Nunn, On the impact of increasing penetration of variable renewables on electricity spot price extremes in Australia. Economic analysis and policy, 2020. 67: p. 67-86

  12. [12]

    Yan, G. and L. Han, The impact of rooftop solar on wholesale electricity demand in the Australian National Electricity Market. Frontiers in Energy Research, 2023. 11: p. 1197504

  13. [13]

    Energy Policy, 2011

    Cutler, N.J., et al., High penetration wind generation impacts on spot prices in the Australian national electricity market. Energy Policy, 2011. 39(10): p. 5939-5949

  14. [14]

    Forrest, S. and I. MacGill, Assessing the impact of wind generation on wholesale prices and generator dispatch in the Australian National Electricity Market. Energy policy,

  15. [15]

    Dinh, and S.A

    Cornell, C., N.T. Dinh, and S.A. Pourmousavi, A probabilistic forecast methodology for volatile electricity prices in the Australian National Electricity Market. International Journal of Forecasting, 2024. 40(4): p. 1421-1437

  16. [16]

    Qu, and T

    Csereklyei, Z., S. Qu, and T. Ancev, The effect of wind and solar power generation on wholesale electricity prices in Australia. Energy Policy, 2019. 131: p. 358-369

  17. [17]

    Nikitopoulos, and A

    Mwampashi, M.M., C.S. Nikitopoulos, and A. Rai, From 30-to 5-minute settlement rule in the NEM: An early evaluation. Energy Policy, 2024. 194: p. 114305

  18. [18]

    Gonçalves, R. and F. Menezes, The price impacts of the exit of the Hazelwood coal power plant. Energy Economics, 2022. 116: p. 106398

  19. [19]

    Energies, 2025

    O’Connor, C., et al., A review of electricity price forecasting models in the day-ahead, intra-day, and balancing markets. Energies, 2025. 18(12): p. 3097

  20. [20]

    Spiliotis, and V

    Makridakis, S., E. Spiliotis, and V. Assimakopoulos, Statistical and Machine Learning forecasting methods: Concerns and ways forward. PloS one, 2018. 13(3): p. e0194889

  21. [21]

    Weron, and F

    Uniejewski, B., R. Weron, and F. Ziel, Variance stabilizing transformations for electricity spot price forecasting. IEEE Transactions on Power Systems, 2017. 33(2): p. 2219-2229

  22. [22]

    Smola, A.J. and B. Schölkopf, A tutorial on support vector regression. Statistics and computing, 2004. 14(3): p. 199-222

  23. [23]

    Annals of statistics, 2001: p

    Friedman, J.H., Greedy function approximation: a gradient boosting machine. Annals of statistics, 2001: p. 1189-1232

  24. [24]

    Advances in neural information processing systems, 2017

    Ke, G., et al., Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 2017. 30

  25. [25]

    Advances in neural information processing systems, 2018

    Prokhorenkova, L., et al., CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 2018. 31

  26. [26]

    Hochreiter, S. and J. Schmidhuber, Long short -term memory. Neural computation,

  27. [27]

    IEEE Transactions on Industrial Electronics,

    Yuan, X., et al., Deep learning with spatiotemporal attention -based LSTM for industrial soft sensor model development. IEEE Transactions on Industrial Electronics,

  28. [28]

    Applied Energy,

    Ghimire, S., et al., Two-step deep learning fr amework with error compensation technique for short -term, half -hourly electricity price forecasting. Applied Energy,

  29. [29]

    Sustainability, 2022

    Wang, D., et al., Electricity Price Instability over Time: Time Series Analysis and Forecasting. Sustainability, 2022. 14(15): p. 9081

  30. [30]

    Electrical Engineering, 2024

    Abroun, M., et al., Predicting long -term electricity prices using modified support vector regression method. Electrical Engineering, 2024. 106(4): p. 4103-4114

  31. [31]

    Scientific Reports, 2025

    Hu, J., et al., A data driven model based approach for medium-to-long-term electricity price forecasting in power markets. Scientific Reports, 2025. 15(1): p. 37046

  32. [32]

    Schlüter, and L

    Das, A., S. Schlüter, and L. Schneider, Electricity Price Prediction Using Multikernel Gaussian Process Regression Combined With Kernel -Based Support Vector Regression. Journal of Forecasting, 2026

  33. [33]

    Kuşkaya, S. and F. Bilgili, Forecasting electricity price index with machine learning models and strategies. Quality & Quantity, 2026. 60(1): p. 2651-2678

  34. [34]

    Nasios, I. and K. Vogklis, Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series. International Journal of Forecasting, 2022. 38(4): p. 1448-1459

  35. [35]

    Oprea, and A.-C

    Bâra, A., S.-V. Oprea, and A.-C. Băroiu, Forecasting the Spot Market Electricity Price with a Long Short -Term Memory Model Architecture in a Disruptive Economic and Geopolitical Context. International Journal of Computational Intelligence Systems,

  36. [36]

    Energies, 2025

    Zi, X., et al., A Deep Learning Method for Photovoltaic Power Generation Forecasting Based on a Time-Series Dense Encoder. Energies, 2025. 18(10): p. 2434

  37. [37]

    Journal of Electrical Engineering & Technology,

    Yang, G., et al., Short-term Price Forecasting Method in Electricity Spot Markets Based on Attention -LSTM-mTCN. Journal of Electrical Engineering & Technology,

  38. [38]

    Energy Economics, 2023

    Marcjasz, G., et al., Distributional neural networks for electricity price forecasting. Energy Economics, 2023. 125: p. 106843

  39. [39]

    Berrisch, J. and F. Ziel, Multivariate probabilistic CRPS learning with an application to day-ahead electricity prices. International Journal of Forecasting, 2024. 40(4): p. 1568-1586

  40. [40]

    A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

    Qin, Y., et al., A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971, 2017

  41. [41]

    Journal of Environmental Chemical Engineering, 2025

    Khoshvaght, H., et al., A critical review on selecting performance evaluation metrics for supervised machine learning models in wastewater quality prediction. Journal of Environmental Chemical Engineering, 2025. 13(6): p. 119675