Beyond Weather Correlation: A Comparative Study of Static and Temporal Neural Architectures for Fine-Grained Residential Energy Consumption Forecasting in Melbourne, Australia
Pith reviewed 2026-05-10 16:13 UTC · model grok-4.3
The pith
Temporal autocorrelation in past energy consumption dominates weather data for accurate 5-minute residential forecasts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When an MLP receives only daily weather observations and an LSTM receives only 24-step sliding windows of 5-minute consumption values from the same 14-month Melbourne smart-meter records, the LSTM attains R^2 scores of 0.883 for the grid-connected house and 0.865 for the rooftop-solar house, compared with -0.055 and 0.410 for the corresponding MLPs. These differences of 93.8 and 45.5 percentage points establish that temporal autocorrelation within the consumption sequence supplies the dominant predictive information at five-minute resolution.
What carries the argument
The controlled head-to-head comparison of a weather-only multilayer perceptron against a consumption-window LSTM that isolates the contribution of sequential memory from static external variables.
If this is right
- Short-term 5-minute energy forecasts should rely primarily on recent consumption history rather than current weather conditions.
- In photovoltaic-equipped homes, weather data supplies modest indirect value through its correlation with solar generation.
- Hybrid models that combine consumption windows with weather features are proposed as a next step to capture any residual meteorological signal.
Where Pith is reading between the lines
- Many practical short-horizon forecasting systems could be built with simple autoregressive components and little or no weather integration.
- Real-time demand-response platforms might achieve better performance by prioritizing consumption-sequence features over meteorological feeds.
- Federated learning across households could scale the temporal-model advantage while keeping individual usage data private.
Load-bearing premise
The MLP is given only static weather features with no lagged consumption values while the LSTM is given only consumption windows with no weather inputs, cleanly separating the two information sources.
What would settle it
Retrain the MLP with the same lagged consumption windows added as inputs and check whether its R-squared values approach those of the LSTM.
Figures
read the original abstract
Accurate short-term residential energy consumption forecasting at sub-hourly resolution is critical for smart grid management, demand response programmes, and renewable energy integration. While weather variables are widely acknowledged as key drivers of residential electricity demand, the relative merit of incorporating temporal autocorrelation - the sequential memory of past consumption; over static meteorological features alone remains underexplored at fine-grained (5-minute) temporal resolution for Australian households. This paper presents a rigorous empirical comparison of a Multilayer Perceptron (MLP) and a Long Short-Term Memory (LSTM) recurrent network applied to two real-world Melbourne households: House 3 (a standard grid-connected dwelling) and House 4 (a rooftop solar photovoltaic-integrated household). Both models are trained on 14 months of 5-minute interval smart meter data (March 2023-April 2024) merged with official Bureau of Meteorology (BOM) daily weather observations, yielding over 117,000 samples per household. The LSTM, operating on 24-step (2-hour) sliding consumption windows, achieves coefficients of determination of R^2 = 0.883 (House 3) and R^2 = 0.865 (House 4), compared to R^2 = -0.055 and R^2 = 0.410 for the corresponding weather-driven MLPs - differences of 93.8 and 45.5 percentage points. These results establish that temporal autocorrelation in the consumption sequence dominates meteorological information for short-term forecasting at 5-minute granularity. Additionally, we demonstrate an asymmetry introduced by solar generation: for the PV-integrated household, the MLP achieves R^2 = 0.410, revealing implicit solar forecasting from weather-time correlations. A persistence baseline analysis and seasonal stratification contextualise model performance. We propose a hybrid weather-augmented LSTM and federated learning extensions as directions for future work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts an empirical comparison of a weather-driven Multilayer Perceptron (MLP) and an LSTM recurrent network for 5-minute resolution residential electricity consumption forecasting on two Melbourne households (one grid-connected, one with rooftop PV). Using 14 months of smart-meter data merged with daily BOM weather observations, it reports that the LSTM (operating on 24-step consumption windows) achieves R² = 0.883 and 0.865 while the MLP yields R² = -0.055 and 0.410, respectively. The authors conclude that temporal autocorrelation dominates static meteorological information at this granularity, supported by a persistence baseline and seasonal stratification, and suggest hybrid and federated extensions.
Significance. If the reported performance gap can be attributed solely to the presence versus absence of temporal structure (with clean isolation of inputs), the result would be a useful empirical demonstration that short-term fine-grained load forecasting benefits more from autoregressive modeling than from weather covariates alone. The use of real Australian household data, inclusion of a persistence baseline, and explicit treatment of the PV-induced asymmetry add concrete value for smart-grid and demand-response applications. The work is primarily empirical rather than theoretical, so its impact hinges on the reproducibility and internal validity of the experimental protocol.
major comments (3)
- Abstract (and presumed Methods section): the central claim that 'temporal autocorrelation in the consumption sequence dominates meteorological information' rests on the assumption that the MLP receives exclusively static daily weather features (no lagged consumption, no time-of-day, no calendar variables) while the LSTM receives exclusively 24-step consumption windows (no weather). The abstract describes 'weather-driven MLPs' and 'LSTM operating on 24-step sliding consumption windows' but supplies no explicit feature lists, no confirmation of separation, and no statement on whether daily weather is simply replicated across all 5-minute samples within a day. Without this isolation, the 93.8 and 45.5 percentage-point R² gaps cannot be attributed solely to autocorrelation versus meteorology.
- Abstract and experimental description: no information is provided on train/test split ratios, temporal ordering of the split, cross-validation strategy, hyperparameter search procedure, or feature scaling. These details are load-bearing for any claim that the LSTM's superiority is robust rather than an artifact of data leakage or overfitting to the 117,000-sample regime.
- Data merging description: daily BOM weather observations are merged with 5-minute consumption data. The paper does not specify the interpolation or assignment method (e.g., forward-fill, linear interpolation, or constant replication) nor whether any derived temporal features (hour-of-day, day-of-week) are inadvertently supplied to the MLP, which would confound the weather-only versus temporal-only contrast.
minor comments (2)
- The abstract states 'over 117,000 samples per household' but does not clarify whether this count is before or after any train/test partitioning or windowing, which affects interpretation of model capacity.
- The persistence baseline and seasonal stratification are mentioned but not quantified in the abstract; moving these numbers into the abstract would improve immediate readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback emphasizing experimental transparency. We have revised the manuscript to explicitly document feature sets, data preprocessing, splitting, and scaling procedures, thereby strengthening the isolation of temporal autocorrelation effects from meteorological inputs.
read point-by-point responses
-
Referee: Abstract (and presumed Methods section): the central claim that 'temporal autocorrelation in the consumption sequence dominates meteorological information' rests on the assumption that the MLP receives exclusively static daily weather features (no lagged consumption, no time-of-day, no calendar variables) while the LSTM receives exclusively 24-step consumption windows (no weather). The abstract describes 'weather-driven MLPs' and 'LSTM operating on 24-step sliding consumption windows' but supplies no explicit feature lists, no confirmation of separation, and no statement on whether daily weather is simply replicated across all 5-minute samples within a day. Without this isolation, the 93.8 and 45.5 percentage-point R² gaps cannot be attributed solely to autocorrelation versus meteorology.
Authors: We agree that the original text lacked an explicit feature inventory, which is needed to fully substantiate the claimed isolation. In the revised manuscript we have inserted a dedicated 'Input Features' subsection under Methods. It states that the MLP receives only the daily BOM weather variables (maximum and minimum temperature, rainfall, solar exposure, wind speed, and humidity), each value replicated constantly across every 5-minute timestamp of that calendar day. No lagged consumption values, hour-of-day, day-of-week, or any other temporal or calendar encodings are supplied to the MLP. The LSTM, by contrast, is provided exclusively with 24-step sliding windows of raw 5-minute consumption observations and receives no weather or calendar inputs whatsoever. This explicit separation is now documented, confirming that the reported R² differentials arise from the presence versus absence of temporal structure. revision: yes
-
Referee: Abstract and experimental description: no information is provided on train/test split ratios, temporal ordering of the split, cross-validation strategy, hyperparameter search procedure, or feature scaling. These details are load-bearing for any claim that the LSTM's superiority is robust rather than an artifact of data leakage or overfitting to the 117,000-sample regime.
Authors: We accept that these protocol details were omitted and are essential for reproducibility. The revised manuscript now contains a 'Training Protocol' subsection that specifies: an 80/20 temporal split (first 11 months for training, final 3 months held out for testing) to preserve chronological order and preclude leakage; no k-fold cross-validation, as is conventional for time-series forecasting; hyperparameter selection performed via grid search on a 10 % validation partition drawn from the training period only; and per-feature min-max scaling whose parameters were computed solely on the training data and then applied unchanged to the test set. These additions demonstrate that the LSTM advantage is evaluated under a leakage-free regime. revision: yes
-
Referee: Data merging description: daily BOM weather observations are merged with 5-minute consumption data. The paper does not specify the interpolation or assignment method (e.g., forward-fill, linear interpolation, or constant replication) nor whether any derived temporal features (hour-of-day, day-of-week) are inadvertently supplied to the MLP, which would confound the weather-only versus temporal-only contrast.
Authors: We have expanded the 'Data Preprocessing' paragraph to describe the merge explicitly. Because the BOM data are daily, each weather observation is assigned via constant replication to every 5-minute interval within its calendar day; no interpolation (linear or otherwise) is performed. The same subsection reiterates that the MLP feature vector contains none of the derived temporal encodings (hour-of-day, day-of-week, month, etc.) that could inadvertently leak sequential information. Consequently, the MLP remains strictly weather-driven while the LSTM remains strictly autoregressive, preserving the intended contrast. revision: yes
Circularity Check
No circularity: purely empirical model comparison with no derivation or self-referential predictions
full rationale
The paper reports a direct head-to-head evaluation of MLP (weather features) versus LSTM (consumption windows) on held-out 5-minute data from two households, with performance measured by R^2 on unseen test periods. No mathematical derivation, no fitted parameters renamed as predictions, no self-citation chains, and no ansatz or uniqueness theorem are invoked. The central claim follows from the observed performance gap on external test data rather than from any definitional equivalence or input recycling. Feature-isolation concerns raised by the skeptic affect interpretability of the gap but do not create circularity in the reported results.
Axiom & Free-Parameter Ledger
free parameters (2)
- LSTM input window length
- Neural network architecture and training hyperparameters
axioms (2)
- domain assumption Daily weather observations can be merged with 5-minute consumption data without introducing substantial temporal misalignment error
- domain assumption The 14-month dataset is stationary enough for supervised training without explicit handling of concept drift or seasonal non-stationarity beyond stratification
Reference graph
Works this paper leans on
-
[1]
State of the Energy Market 2023,
Australian Energy Regulator, “State of the Energy Market 2023,” AER, Canberra,
work page 2023
- [2]
-
[3]
Small-Scale Technology Certificates Data,
Clean Energy Regulator, “Small-Scale Technology Certificates Data,” 2024. [Online]. Available:https://www.cleanenergyregulator.gov.au
work page 2024
-
[4]
Australian Energy Market Operator, “Five Minute Settlement,” AEMO, Melbourne,
- [5]
-
[6]
Probabilistic electric load forecasting: A tutorial review,
T. Hong and S. Fan, “Probabilistic electric load forecasting: A tutorial review,” Int. J. Forecasting, vol. 32, no. 3, pp. 914–938, 2016
work page 2016
-
[7]
Day-ahead load forecast using random forest and expert input selection,
A. Lahouar and J. B. H. Slama, “Day-ahead load forecast using random forest and expert input selection,”Energy Convers. Manag., vol. 103, pp. 1040–1051, 2015
work page 2015
-
[8]
G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung,Time Series Analysis: Forecasting and Control, 5th ed. Hoboken, NJ: Wiley, 2015. 18 Residential Energy Forecasting: MLP vs LSTM Ukwatta Hewage and Wu, 2026
work page 2015
-
[9]
D. W. Bunn and E. D. Farmer,Comparative Models for Electrical Load Forecasting. Chichester, UK: Wiley, 1985
work page 1985
-
[10]
A regression-based approach to short- term system load forecasting,
A. D. Papalexopoulos and T. C. Hesterberg, “A regression-based approach to short- term system load forecasting,”IEEE Trans. Power Syst., vol. 5, no. 4, pp. 1535–1547, 1990
work page 1990
-
[11]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997
work page 1997
-
[12]
Deep learning for household load forecasting—A novel pooling deep RNN,
H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting—A novel pooling deep RNN,”IEEE Trans. Smart Grid, vol. 9, no. 5, pp. 5271–5280, 2018
work page 2018
-
[13]
Short-term load forecasting with deep residual networks,
K. Chenet al., “Short-term load forecasting with deep residual networks,”IEEE Trans. Smart Grid, vol. 10, no. 4, pp. 3943–3952, 2019
work page 2019
-
[14]
Informer: Beyond efficient transformer for long sequence time-series forecasting,
H. Zhouet al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProc. AAAI, 2021, pp. 11106–11115
work page 2021
-
[15]
A time series is worth 64 words: Long-term forecasting with transformers,
Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” inProc. ICLR, 2023
work page 2023
-
[16]
Temporal fusion transformers for interpretable multi-horizon time series forecasting,
B. Lim, S. ¨O. Arık, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,”Int. J. Forecasting, vol. 37, no. 4, pp. 1748–1764, 2021
work page 2021
-
[17]
Models for mid-term electricity demand forecasting incorporating weather influences,
S. Mirasgediset al., “Models for mid-term electricity demand forecasting incorporating weather influences,”Energy, vol. 31, no. 2–3, pp. 208–227, 2006
work page 2006
-
[18]
Using smart meter data to improve the accuracy of intraday load forecasting,
F. L. Quilumbaet al., “Using smart meter data to improve the accuracy of intraday load forecasting,”IEEE Trans. Smart Grid, vol. 6, no. 2, pp. 911–918, 2015
work page 2015
-
[19]
Deep learning for estimating building energy consumption,
E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling, “Deep learning for estimating building energy consumption,”Sustain. Energy, Grids Networks, vol. 6, pp. 91–99, 2016
work page 2016
-
[20]
REDD: A public data set for energy disaggregation research,
J. Z. Kolter and M. J. Johnson, “REDD: A public data set for energy disaggregation research,” inProc. SustKDD Workshop, San Diego, CA, 2011
work page 2011
-
[21]
A. Nottrott, J. Kleissl, and B. Washom, “Energy dispatch schedule optimization and cost benefit analysis for grid-connected, photovoltaic-battery storage systems,” Renew. Energy, vol. 55, pp. 230–240, 2013
work page 2013
-
[22]
Online short-term solar power forecasting,
P. Bacher, H. Madsen, and H. A. Nielsen, “Online short-term solar power forecasting,” Solar Energy, vol. 83, no. 10, pp. 1772–1783, 2009. 19 Residential Energy Forecasting: MLP vs LSTM Ukwatta Hewage and Wu, 2026
work page 2009
-
[23]
T. Ahmad, H. Zhang, and B. Yan, “A review on renewable energy and electricity requirement forecasting models for smart grid and buildings,”Sustain. Cities Soc., vol. 55, p. 102052, 2020
work page 2020
-
[24]
Z. Liuet al., “A new short-term load forecasting method of power system using improved genetic algorithm to optimize BP neural network,”Energy Buildings, vol. 72, pp. 361–369, 2014
work page 2014
-
[25]
Evaluating time series forecasting models: An empirical study on performance estimation methods,
V. Cerqueira, L. Torgo, and I. Mozetiˇ c, “Evaluating time series forecasting models: An empirical study on performance estimation methods,”Mach. Learn., vol. 109, pp. 1997–2028, 2020
work page 1997
-
[26]
Adam: A method for stochastic optimization,
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. ICLR, San Diego, CA, 2015
work page 2015
-
[27]
Residential Energy Consumption Benchmarks,
Australian Energy Market Operator, “Residential Energy Consumption Benchmarks,” AEMO, Melbourne, 2020
work page 2020
-
[28]
Ausgrid, “Solar Home Electricity Data,” 2013. [Online]. Available: https://www. ausgrid.com.au/Industry/Innovation/Data-to-share
work page 2013
-
[29]
Communication-efficient learning of deep networks from decen- tralized data,
B. McMahanet al., “Communication-efficient learning of deep networks from decen- tralized data,” inProc. AISTATS, 2017, pp. 1273–1282. A. Reproducibility Details Software environment:Python 3.12, TensorFlow 2.x/Keras, NumPy, Pandas, Scikit- learn 1.8, Matplotlib 3.10, Seaborn 0.13 (Google Colab T4 GPU environment for original LSTM training; local Apple Si...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.