pith. sign in

arxiv: 2604.12304 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.SY· eess.SY

Beyond Weather Correlation: A Comparative Study of Static and Temporal Neural Architectures for Fine-Grained Residential Energy Consumption Forecasting in Melbourne, Australia

Pith reviewed 2026-05-10 16:13 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY
keywords residential energy forecastingLSTMMLPtemporal autocorrelation5-minute granularityweather correlationsmart gridMelbourne households
0
0 comments X

The pith

Temporal autocorrelation in past energy consumption dominates weather data for accurate 5-minute residential forecasts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares a multilayer perceptron fed only static weather features against a long short-term memory network that sees only recent consumption history to forecast electricity use at 5-minute intervals in two real Melbourne households. The LSTM reaches R-squared values of 0.883 and 0.865 while the weather-only MLP scores -0.055 and 0.410, a gap the authors attribute to the strong sequential dependence within the consumption time series itself. This matters because short-term forecasts at this granularity support smart-grid balancing, demand response, and renewable integration, where even modest accuracy gains translate to operational savings. The work also notes that solar-equipped homes show partial weather sensitivity through generation effects, but the temporal signal still prevails overall.

Core claim

When an MLP receives only daily weather observations and an LSTM receives only 24-step sliding windows of 5-minute consumption values from the same 14-month Melbourne smart-meter records, the LSTM attains R^2 scores of 0.883 for the grid-connected house and 0.865 for the rooftop-solar house, compared with -0.055 and 0.410 for the corresponding MLPs. These differences of 93.8 and 45.5 percentage points establish that temporal autocorrelation within the consumption sequence supplies the dominant predictive information at five-minute resolution.

What carries the argument

The controlled head-to-head comparison of a weather-only multilayer perceptron against a consumption-window LSTM that isolates the contribution of sequential memory from static external variables.

If this is right

  • Short-term 5-minute energy forecasts should rely primarily on recent consumption history rather than current weather conditions.
  • In photovoltaic-equipped homes, weather data supplies modest indirect value through its correlation with solar generation.
  • Hybrid models that combine consumption windows with weather features are proposed as a next step to capture any residual meteorological signal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many practical short-horizon forecasting systems could be built with simple autoregressive components and little or no weather integration.
  • Real-time demand-response platforms might achieve better performance by prioritizing consumption-sequence features over meteorological feeds.
  • Federated learning across households could scale the temporal-model advantage while keeping individual usage data private.

Load-bearing premise

The MLP is given only static weather features with no lagged consumption values while the LSTM is given only consumption windows with no weather inputs, cleanly separating the two information sources.

What would settle it

Retrain the MLP with the same lagged consumption windows added as inputs and check whether its R-squared values approach those of the LSTM.

Figures

Figures reproduced from arXiv: 2604.12304 by Hao Wu, Prasad Nimantha Madusanka Ukwatta Hewage.

Figure 1
Figure 1. Figure 1: Neural network architectures: (a) Multilayer Perceptron mapping static weather [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Model performance comparison (R2 ) across all architectures and both households. The LSTM achieves an improvement of 93.3 percentage points over the MLP on House 3. Note: Positive R2 values indicate explanatory power above the mean baseline; negative values indicate performance below the trivial mean predictor. 12 [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: House 3 test set: actual consumption vs. model predictions over a representative [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: House 4 (Solar PV): grid draw, solar generation, and total consumption over [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: House 3 seasonal analysis: (a) consumption distribution by season; (b) median [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pearson correlation heatmaps: (a) House 3 weather features vs. grid consumption; [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
read the original abstract

Accurate short-term residential energy consumption forecasting at sub-hourly resolution is critical for smart grid management, demand response programmes, and renewable energy integration. While weather variables are widely acknowledged as key drivers of residential electricity demand, the relative merit of incorporating temporal autocorrelation - the sequential memory of past consumption; over static meteorological features alone remains underexplored at fine-grained (5-minute) temporal resolution for Australian households. This paper presents a rigorous empirical comparison of a Multilayer Perceptron (MLP) and a Long Short-Term Memory (LSTM) recurrent network applied to two real-world Melbourne households: House 3 (a standard grid-connected dwelling) and House 4 (a rooftop solar photovoltaic-integrated household). Both models are trained on 14 months of 5-minute interval smart meter data (March 2023-April 2024) merged with official Bureau of Meteorology (BOM) daily weather observations, yielding over 117,000 samples per household. The LSTM, operating on 24-step (2-hour) sliding consumption windows, achieves coefficients of determination of R^2 = 0.883 (House 3) and R^2 = 0.865 (House 4), compared to R^2 = -0.055 and R^2 = 0.410 for the corresponding weather-driven MLPs - differences of 93.8 and 45.5 percentage points. These results establish that temporal autocorrelation in the consumption sequence dominates meteorological information for short-term forecasting at 5-minute granularity. Additionally, we demonstrate an asymmetry introduced by solar generation: for the PV-integrated household, the MLP achieves R^2 = 0.410, revealing implicit solar forecasting from weather-time correlations. A persistence baseline analysis and seasonal stratification contextualise model performance. We propose a hybrid weather-augmented LSTM and federated learning extensions as directions for future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper conducts an empirical comparison of a weather-driven Multilayer Perceptron (MLP) and an LSTM recurrent network for 5-minute resolution residential electricity consumption forecasting on two Melbourne households (one grid-connected, one with rooftop PV). Using 14 months of smart-meter data merged with daily BOM weather observations, it reports that the LSTM (operating on 24-step consumption windows) achieves R² = 0.883 and 0.865 while the MLP yields R² = -0.055 and 0.410, respectively. The authors conclude that temporal autocorrelation dominates static meteorological information at this granularity, supported by a persistence baseline and seasonal stratification, and suggest hybrid and federated extensions.

Significance. If the reported performance gap can be attributed solely to the presence versus absence of temporal structure (with clean isolation of inputs), the result would be a useful empirical demonstration that short-term fine-grained load forecasting benefits more from autoregressive modeling than from weather covariates alone. The use of real Australian household data, inclusion of a persistence baseline, and explicit treatment of the PV-induced asymmetry add concrete value for smart-grid and demand-response applications. The work is primarily empirical rather than theoretical, so its impact hinges on the reproducibility and internal validity of the experimental protocol.

major comments (3)
  1. Abstract (and presumed Methods section): the central claim that 'temporal autocorrelation in the consumption sequence dominates meteorological information' rests on the assumption that the MLP receives exclusively static daily weather features (no lagged consumption, no time-of-day, no calendar variables) while the LSTM receives exclusively 24-step consumption windows (no weather). The abstract describes 'weather-driven MLPs' and 'LSTM operating on 24-step sliding consumption windows' but supplies no explicit feature lists, no confirmation of separation, and no statement on whether daily weather is simply replicated across all 5-minute samples within a day. Without this isolation, the 93.8 and 45.5 percentage-point R² gaps cannot be attributed solely to autocorrelation versus meteorology.
  2. Abstract and experimental description: no information is provided on train/test split ratios, temporal ordering of the split, cross-validation strategy, hyperparameter search procedure, or feature scaling. These details are load-bearing for any claim that the LSTM's superiority is robust rather than an artifact of data leakage or overfitting to the 117,000-sample regime.
  3. Data merging description: daily BOM weather observations are merged with 5-minute consumption data. The paper does not specify the interpolation or assignment method (e.g., forward-fill, linear interpolation, or constant replication) nor whether any derived temporal features (hour-of-day, day-of-week) are inadvertently supplied to the MLP, which would confound the weather-only versus temporal-only contrast.
minor comments (2)
  1. The abstract states 'over 117,000 samples per household' but does not clarify whether this count is before or after any train/test partitioning or windowing, which affects interpretation of model capacity.
  2. The persistence baseline and seasonal stratification are mentioned but not quantified in the abstract; moving these numbers into the abstract would improve immediate readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing experimental transparency. We have revised the manuscript to explicitly document feature sets, data preprocessing, splitting, and scaling procedures, thereby strengthening the isolation of temporal autocorrelation effects from meteorological inputs.

read point-by-point responses
  1. Referee: Abstract (and presumed Methods section): the central claim that 'temporal autocorrelation in the consumption sequence dominates meteorological information' rests on the assumption that the MLP receives exclusively static daily weather features (no lagged consumption, no time-of-day, no calendar variables) while the LSTM receives exclusively 24-step consumption windows (no weather). The abstract describes 'weather-driven MLPs' and 'LSTM operating on 24-step sliding consumption windows' but supplies no explicit feature lists, no confirmation of separation, and no statement on whether daily weather is simply replicated across all 5-minute samples within a day. Without this isolation, the 93.8 and 45.5 percentage-point R² gaps cannot be attributed solely to autocorrelation versus meteorology.

    Authors: We agree that the original text lacked an explicit feature inventory, which is needed to fully substantiate the claimed isolation. In the revised manuscript we have inserted a dedicated 'Input Features' subsection under Methods. It states that the MLP receives only the daily BOM weather variables (maximum and minimum temperature, rainfall, solar exposure, wind speed, and humidity), each value replicated constantly across every 5-minute timestamp of that calendar day. No lagged consumption values, hour-of-day, day-of-week, or any other temporal or calendar encodings are supplied to the MLP. The LSTM, by contrast, is provided exclusively with 24-step sliding windows of raw 5-minute consumption observations and receives no weather or calendar inputs whatsoever. This explicit separation is now documented, confirming that the reported R² differentials arise from the presence versus absence of temporal structure. revision: yes

  2. Referee: Abstract and experimental description: no information is provided on train/test split ratios, temporal ordering of the split, cross-validation strategy, hyperparameter search procedure, or feature scaling. These details are load-bearing for any claim that the LSTM's superiority is robust rather than an artifact of data leakage or overfitting to the 117,000-sample regime.

    Authors: We accept that these protocol details were omitted and are essential for reproducibility. The revised manuscript now contains a 'Training Protocol' subsection that specifies: an 80/20 temporal split (first 11 months for training, final 3 months held out for testing) to preserve chronological order and preclude leakage; no k-fold cross-validation, as is conventional for time-series forecasting; hyperparameter selection performed via grid search on a 10 % validation partition drawn from the training period only; and per-feature min-max scaling whose parameters were computed solely on the training data and then applied unchanged to the test set. These additions demonstrate that the LSTM advantage is evaluated under a leakage-free regime. revision: yes

  3. Referee: Data merging description: daily BOM weather observations are merged with 5-minute consumption data. The paper does not specify the interpolation or assignment method (e.g., forward-fill, linear interpolation, or constant replication) nor whether any derived temporal features (hour-of-day, day-of-week) are inadvertently supplied to the MLP, which would confound the weather-only versus temporal-only contrast.

    Authors: We have expanded the 'Data Preprocessing' paragraph to describe the merge explicitly. Because the BOM data are daily, each weather observation is assigned via constant replication to every 5-minute interval within its calendar day; no interpolation (linear or otherwise) is performed. The same subsection reiterates that the MLP feature vector contains none of the derived temporal encodings (hour-of-day, day-of-week, month, etc.) that could inadvertently leak sequential information. Consequently, the MLP remains strictly weather-driven while the LSTM remains strictly autoregressive, preserving the intended contrast. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical model comparison with no derivation or self-referential predictions

full rationale

The paper reports a direct head-to-head evaluation of MLP (weather features) versus LSTM (consumption windows) on held-out 5-minute data from two households, with performance measured by R^2 on unseen test periods. No mathematical derivation, no fitted parameters renamed as predictions, no self-citation chains, and no ansatz or uniqueness theorem are invoked. The central claim follows from the observed performance gap on external test data rather than from any definitional equivalence or input recycling. Feature-isolation concerns raised by the skeptic affect interpretability of the gap but do not create circularity in the reported results.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on empirical performance differences that depend on numerous unfixed neural network hyperparameters and data alignment choices not detailed in the abstract.

free parameters (2)
  • LSTM input window length
    24-step (2-hour) sliding windows selected for the temporal model input
  • Neural network architecture and training hyperparameters
    Number of layers, hidden units, learning rate, batch size, and regularization choices required to achieve the reported R² values
axioms (2)
  • domain assumption Daily weather observations can be merged with 5-minute consumption data without introducing substantial temporal misalignment error
    BOM data is daily while consumption is 5-minute
  • domain assumption The 14-month dataset is stationary enough for supervised training without explicit handling of concept drift or seasonal non-stationarity beyond stratification
    Standard assumption for training on fixed historical window

pith-pipeline@v0.9.0 · 5664 in / 1484 out tokens · 96436 ms · 2026-05-10T16:13:44.720066+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    State of the Energy Market 2023,

    Australian Energy Regulator, “State of the Energy Market 2023,” AER, Canberra,

  2. [2]

    Available:https://www.aer.gov.au

    [Online]. Available:https://www.aer.gov.au

  3. [3]

    Small-Scale Technology Certificates Data,

    Clean Energy Regulator, “Small-Scale Technology Certificates Data,” 2024. [Online]. Available:https://www.cleanenergyregulator.gov.au

  4. [4]

    Five Minute Settlement,

    Australian Energy Market Operator, “Five Minute Settlement,” AEMO, Melbourne,

  5. [5]

    Available:https://www.aemo.com.au

    [Online]. Available:https://www.aemo.com.au

  6. [6]

    Probabilistic electric load forecasting: A tutorial review,

    T. Hong and S. Fan, “Probabilistic electric load forecasting: A tutorial review,” Int. J. Forecasting, vol. 32, no. 3, pp. 914–938, 2016

  7. [7]

    Day-ahead load forecast using random forest and expert input selection,

    A. Lahouar and J. B. H. Slama, “Day-ahead load forecast using random forest and expert input selection,”Energy Convers. Manag., vol. 103, pp. 1040–1051, 2015

  8. [8]

    G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung,Time Series Analysis: Forecasting and Control, 5th ed. Hoboken, NJ: Wiley, 2015. 18 Residential Energy Forecasting: MLP vs LSTM Ukwatta Hewage and Wu, 2026

  9. [9]

    D. W. Bunn and E. D. Farmer,Comparative Models for Electrical Load Forecasting. Chichester, UK: Wiley, 1985

  10. [10]

    A regression-based approach to short- term system load forecasting,

    A. D. Papalexopoulos and T. C. Hesterberg, “A regression-based approach to short- term system load forecasting,”IEEE Trans. Power Syst., vol. 5, no. 4, pp. 1535–1547, 1990

  11. [11]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997

  12. [12]

    Deep learning for household load forecasting—A novel pooling deep RNN,

    H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting—A novel pooling deep RNN,”IEEE Trans. Smart Grid, vol. 9, no. 5, pp. 5271–5280, 2018

  13. [13]

    Short-term load forecasting with deep residual networks,

    K. Chenet al., “Short-term load forecasting with deep residual networks,”IEEE Trans. Smart Grid, vol. 10, no. 4, pp. 3943–3952, 2019

  14. [14]

    Informer: Beyond efficient transformer for long sequence time-series forecasting,

    H. Zhouet al., “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProc. AAAI, 2021, pp. 11106–11115

  15. [15]

    A time series is worth 64 words: Long-term forecasting with transformers,

    Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” inProc. ICLR, 2023

  16. [16]

    Temporal fusion transformers for interpretable multi-horizon time series forecasting,

    B. Lim, S. ¨O. Arık, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,”Int. J. Forecasting, vol. 37, no. 4, pp. 1748–1764, 2021

  17. [17]

    Models for mid-term electricity demand forecasting incorporating weather influences,

    S. Mirasgediset al., “Models for mid-term electricity demand forecasting incorporating weather influences,”Energy, vol. 31, no. 2–3, pp. 208–227, 2006

  18. [18]

    Using smart meter data to improve the accuracy of intraday load forecasting,

    F. L. Quilumbaet al., “Using smart meter data to improve the accuracy of intraday load forecasting,”IEEE Trans. Smart Grid, vol. 6, no. 2, pp. 911–918, 2015

  19. [19]

    Deep learning for estimating building energy consumption,

    E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling, “Deep learning for estimating building energy consumption,”Sustain. Energy, Grids Networks, vol. 6, pp. 91–99, 2016

  20. [20]

    REDD: A public data set for energy disaggregation research,

    J. Z. Kolter and M. J. Johnson, “REDD: A public data set for energy disaggregation research,” inProc. SustKDD Workshop, San Diego, CA, 2011

  21. [21]

    Energy dispatch schedule optimization and cost benefit analysis for grid-connected, photovoltaic-battery storage systems,

    A. Nottrott, J. Kleissl, and B. Washom, “Energy dispatch schedule optimization and cost benefit analysis for grid-connected, photovoltaic-battery storage systems,” Renew. Energy, vol. 55, pp. 230–240, 2013

  22. [22]

    Online short-term solar power forecasting,

    P. Bacher, H. Madsen, and H. A. Nielsen, “Online short-term solar power forecasting,” Solar Energy, vol. 83, no. 10, pp. 1772–1783, 2009. 19 Residential Energy Forecasting: MLP vs LSTM Ukwatta Hewage and Wu, 2026

  23. [23]

    A review on renewable energy and electricity requirement forecasting models for smart grid and buildings,

    T. Ahmad, H. Zhang, and B. Yan, “A review on renewable energy and electricity requirement forecasting models for smart grid and buildings,”Sustain. Cities Soc., vol. 55, p. 102052, 2020

  24. [24]

    A new short-term load forecasting method of power system using improved genetic algorithm to optimize BP neural network,

    Z. Liuet al., “A new short-term load forecasting method of power system using improved genetic algorithm to optimize BP neural network,”Energy Buildings, vol. 72, pp. 361–369, 2014

  25. [25]

    Evaluating time series forecasting models: An empirical study on performance estimation methods,

    V. Cerqueira, L. Torgo, and I. Mozetiˇ c, “Evaluating time series forecasting models: An empirical study on performance estimation methods,”Mach. Learn., vol. 109, pp. 1997–2028, 2020

  26. [26]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. ICLR, San Diego, CA, 2015

  27. [27]

    Residential Energy Consumption Benchmarks,

    Australian Energy Market Operator, “Residential Energy Consumption Benchmarks,” AEMO, Melbourne, 2020

  28. [28]

    Solar Home Electricity Data,

    Ausgrid, “Solar Home Electricity Data,” 2013. [Online]. Available: https://www. ausgrid.com.au/Industry/Innovation/Data-to-share

  29. [29]

    Communication-efficient learning of deep networks from decen- tralized data,

    B. McMahanet al., “Communication-efficient learning of deep networks from decen- tralized data,” inProc. AISTATS, 2017, pp. 1273–1282. A. Reproducibility Details Software environment:Python 3.12, TensorFlow 2.x/Keras, NumPy, Pandas, Scikit- learn 1.8, Matplotlib 3.10, Seaborn 0.13 (Google Colab T4 GPU environment for original LSTM training; local Apple Si...