pith. machine review for the scientific record. sign in

arxiv: 2605.09722 · v1 · submitted 2026-05-10 · 💻 cs.LG

Recognition: no theorem link

Benchmarking Transformer and xLSTM for Time-Series Forecasting of Heat Consumption

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:57 UTC · model grok-4.3

classification 💻 cs.LG
keywords heat demand forecastingtime series forecastingxLSTMTransformerbenchmarkingdistrict heatingenergy consumptioncomputational efficiency
0
0 comments X

The pith

xLSTM delivers the lowest RMSE for building heat demand forecasts while simpler networks match accuracy at far lower computational cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks xLSTM and Transformer architectures against simpler models for short-term heat consumption prediction using hourly data pooled from 25 German buildings. Accurate forecasts support reliable and cost-efficient operation of district heating networks, yet the time series depend heavily on outdoor temperature and varying usage patterns. The evaluation covers three-hour and 24-hour horizons and tracks both error metrics and resource demands such as training time and parameter counts. Results show that while xLSTM leads on root mean square error and one Transformer variant leads on mean absolute error, a basic fully connected network produces competitive forecasts, indicating that the added complexity brings only marginal benefits.

Core claim

On pooled hourly data from 25 heterogeneous buildings, the xLSTM architecture records the lowest root mean square errors of 19.88 kWh for three-hour forecasts and 21.47 kWh for 24-hour forecasts, while the Temporal Fusion Transformer attains the lowest mean absolute error of 9.16 kWh at the three-hour horizon. These models are trained to generalize across diverse building stocks rather than on single structures. The work also measures training duration and trainable parameter counts, revealing that the advanced architectures require substantially more resources than a traditional fully connected network, which still delivers good predictive performance for the same forecasting tasks.

What carries the argument

The pooled multi-building training benchmark that evaluates cross-building generalization together with paired measurement of forecast error against training time and parameter count.

If this is right

  • xLSTM records the lowest RMSE for both three-hour and 24-hour forecast horizons.
  • The Temporal Fusion Transformer records the lowest MAE for three-hour forecasts.
  • A traditional fully connected network achieves competitive accuracy with far fewer parameters and shorter training times.
  • The marginal accuracy gains of the advanced models come at substantial increases in computational resource demand.
  • Sustainability concerns arise for xLSTM and Transformer models in practical district heating applications because of their training costs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pattern of diminishing returns from complex sequence models may appear in other forecasting tasks that include strong exogenous drivers such as temperature.
  • Scaling the benchmark to additional buildings or longer time spans would test whether the observed resource-accuracy trade-off persists.
  • If simpler models generalize reliably, operators of district heating networks could reduce the energy footprint of maintaining forecast systems.

Load-bearing premise

Training on pooled data from 25 heterogeneous buildings produces models that generalize fairly across the building stock without hidden data leakage or unequal hyperparameter tuning across architectures.

What would settle it

Retraining every architecture on the same 25-building pool with an identical hyperparameter search budget and then measuring whether the fully connected network's RMSE on a new test building stays within 10 percent of the xLSTM value.

Figures

Figures reproduced from arXiv: 2605.09722 by Daniel R. Bayer, Marco Pruckner, Marja Wahl, Sven Rausch.

Figure 1
Figure 1. Figure 1: Architecture of the adapted xLSTM. mLSTM and sLSTM are ordered as in the original xLSTM paper [12], added layers are described with blue text box. Our implementation follows the architecture proposed by Beck et al. [12], with minor adaptations for heat data forecasting [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Average RMSE (kWh) per forecasting step for each model in 24-hour predic￾tion. Here, baseline refers to the na¨ıve forecast. 5.2.1 Memory Requirements FCNs require memory proportional to the number of weights between layers. Our im￾plementation, with only one layer with 131 neurons, demonstrated remarkable learnable parameter efficiency (8,277 trainable parameters). LSTMs have a memory complexity of O(nd),… view at source ↗
Figure 3
Figure 3. Figure 3: Example of 24-hour forecasts of the xLSTM for one time series of the test set. The black line shows the recorded values, while the red line shows the 24-hour forecast. 1.15, which was incompatible with available GPUs. The CPU that is used is an Intel Core i7-1165G7. The GPU used is a NVIDIA L40S [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of average RMSE and MAE for 5-seeds of models with 95% [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-building evaluation of nRSE distributions for 24-hour prediction, logarith￾mically scaled. The series are sorted by average energy consumption, ascending from left to right. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

Obtaining an accurate short-term forecasting for heat demand is an essential part of operating district heating networks cost-efficient and reliable. Heat consumption time series at the building level are highly dependent on exogenous variables such as outdoor temperature and individual usage patterns, making forecasting in this context a challenging task. Thus, this paper benchmarks novel Transformer-based and xLSTM architectures for short-term heat-demand forecasting. Using hourly data from 25 German buildings (2017-2025), we compare three-hour and 24-hour forecasting horizons relevant for intraday control and day-ahead scheduling. We establish a multi-building benchmark that tests whether models trained on pooled, heterogeneous building data are able to generalize across diverse building stock. The results show that the xLSTM achieves the lowest RMSE (19.88 kWh for three-hour, 21.47 kWh for 24-hour forecasts), while the Temporal Fusion Transformer attains the best MAE (9.16 kWh for three-hour forecasts). As xLSTMs and Transformers require long training times and have a huge number of trainable parameters, their sustainability remains questionable. Therefore, this paper further investigates the trade-off between predictive accuracy and computational resource demand of the evaluated forecasting models. The findings indicate that also low-parameter models like a traditional fully-connected network achieve good predictive results, highlighting that marginal accuracy gains of the novel prediction models come at substantial resource expense for this use case.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper benchmarks xLSTM, Temporal Fusion Transformer (TFT), other Transformer variants, and a fully-connected (FC) baseline for short-term heat-demand forecasting. Using pooled hourly data from 25 German buildings (2017-2025), it evaluates 3-hour and 24-hour horizons and reports that xLSTM attains the lowest RMSE (19.88 kWh for 3 h, 21.47 kWh for 24 h) while TFT attains the best MAE (9.16 kWh for 3 h); it further claims that low-parameter FC networks achieve competitive accuracy at far lower computational cost, questioning the sustainability of high-parameter models.

Significance. If the experimental protocol is shown to be free of leakage and to apply equal tuning effort, the work would supply a useful multi-building benchmark for district-heating forecasting and would usefully quantify the accuracy–compute trade-off for this domain. The concrete RMSE/MAE numbers on held-out periods and the explicit comparison with a simple FC baseline are strengths that would support practical deployment decisions.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (Experimental Setup): the headline performance claims rest on models trained on pooled data from 25 heterogeneous buildings, yet no description is given of the temporal train/validation/test split, the forecasting-horizon-aware partitioning, or any safeguards against cross-building or future-data leakage. Without these details the reported margins (xLSTM RMSE 19.88/21.47 kWh, TFT MAE 9.16 kWh) cannot be verified as architectural rather than artifactual.
  2. [§4 and §3.2] §4 (Results) and §3.2 (Hyperparameter protocol): no information is supplied on the hyperparameter search budget, search space, number of trials, or early-stopping rule applied uniformly to xLSTM, TFT, and the FC baseline. Because the central claim is that xLSTM and TFT are superior (or that the margin is small), unequal optimization effort is a load-bearing threat to the ranking.
minor comments (2)
  1. [Table 1] Table 1 (or equivalent results table): column headers for RMSE/MAE should explicitly state the forecast horizon and the unit (kWh) to avoid ambiguity when readers compare 3 h vs. 24 h rows.
  2. [Conclusion] The sustainability discussion in the abstract and conclusion would benefit from a single additional column or row reporting wall-clock training time or parameter count for each model on the same hardware.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We will revise the manuscript to supply the missing details on data partitioning and hyperparameter tuning, thereby improving transparency and allowing readers to verify that the reported performance differences are not artifacts.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Experimental Setup): the headline performance claims rest on models trained on pooled data from 25 heterogeneous buildings, yet no description is given of the temporal train/validation/test split, the forecasting-horizon-aware partitioning, or any safeguards against cross-building or future-data leakage. Without these details the reported margins (xLSTM RMSE 19.88/21.47 kWh, TFT MAE 9.16 kWh) cannot be verified as architectural rather than artifactual.

    Authors: We agree that a clear description of the data partitioning is essential to confirm the absence of leakage. In the revised manuscript we will expand Section 3 with a dedicated subsection that specifies the exact temporal train/validation/test split, the forecasting-horizon-aware windowing procedure, and the explicit safeguards implemented against cross-building and future-data leakage. revision: yes

  2. Referee: [§4 and §3.2] §4 (Results) and §3.2 (Hyperparameter protocol): no information is supplied on the hyperparameter search budget, search space, number of trials, or early-stopping rule applied uniformly to xLSTM, TFT, and the FC baseline. Because the central claim is that xLSTM and TFT are superior (or that the margin is small), unequal optimization effort is a load-bearing threat to the ranking.

    Authors: We acknowledge that the hyperparameter optimization protocol must be documented in detail to support the fairness of the comparisons. In the revision we will augment Section 3.2 with the search budget, search spaces, number of trials, optimization method, and early-stopping rule, confirming that the same protocol was applied to xLSTM, TFT, and the FC baseline. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking on external data

full rationale

The paper is an empirical benchmarking study that trains and evaluates forecasting models (xLSTM, TFT, FC) on hourly heat consumption data from 25 buildings and reports test-set metrics (RMSE, MAE) for 3h and 24h horizons. No derivation chain, first-principles equations, or fitted parameters are presented that could reduce to the inputs by construction. All claims rest on standard train/test splits and performance numbers computed on held-out data, making the work self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central empirical claims rest on standard machine-learning assumptions about time-series modeling and on the representativeness of the 25-building German dataset; no new entities are postulated.

free parameters (1)
  • model-specific hyperparameters
    Learning rates, layer counts, attention heads, and training epochs for each architecture are chosen during optimization and directly affect the reported RMSE/MAE values.
axioms (1)
  • domain assumption Hourly heat-consumption series can be adequately modeled by recurrent and attention-based architectures without additional domain-specific feature engineering.
    Invoked by the choice to benchmark xLSTM and Transformer variants directly on raw or minimally processed time series.

pith-pipeline@v0.9.0 · 5553 in / 1289 out tokens · 55268 ms · 2026-05-12T03:57:09.658039+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

  1. [1]

    Energy and climate

    German Federal Ministry for Economic Cooperation and Development, “Energy and climate.” https://www.bmz.de/en/issues/climate-change-and-development/energy- and-climate, 2025

  2. [2]

    Why heat is a challenge in the fight against climate change, and what we can do about it,

    T. Pan, “Why heat is a challenge in the fight against climate change, and what we can do about it,” 2023

  3. [3]

    Design and implementation of a decision integration system for monitoring and optimizing heating systems: Results and lessons learned,

    K. Djebko, D. Weidner, M. Waleska, T. Krey, B. Kamble, S. Rausch, D. Seipel, and F. Puppe, “Design and implementation of a decision integration system for monitoring and optimizing heating systems: Results and lessons learned,”Energies, vol. 17, no. 24, p. 6290, 2024

  4. [4]

    An operational strategy for district heating networks: application of energetic flexibility,

    A. Vandermeulen, B. van der Heijde, D. Patteeuw, D. Vanhoudt, and L. Helsen, “An operational strategy for district heating networks: application of energetic flexibility,”Energy Informatics, vol. 3, no. 1, pp. 1–17, 2020

  5. [5]

    Cooling, heating and electric load forecasting for integrated energy system based on lstm-cnn,

    Y. Jiang, Q. Wan, X. Yang, Y. Xu, and Y. Wang, “Cooling, heating and electric load forecasting for integrated energy system based on lstm-cnn,” in2021 IEEE 4th International Electrical and Energy Conference (CIEEC), pp. 1–6, IEEE, 2021

  6. [6]

    Optimizing energy efficiency: predicting heating load with a machine learning approach and meta-heuristic algorithms,

    Y. Feng, “Optimizing energy efficiency: predicting heating load with a machine learning approach and meta-heuristic algorithms,”Multiscale and Multidisciplinary Modeling, Experiments and Design, vol. 7, pp. 3993–4009, 2024

  7. [7]

    Time-series power forecasting for wind and solar energy based on the sl-transformer,

    J. Zhu, Z. Zhao, X. Zheng, Z. An, Q. Guo, Z. Li, J. Sun, and Y. Guo, “Time-series power forecasting for wind and solar energy based on the sl-transformer,”Energies, vol. 16, no. 22, p. 7610, 2023

  8. [8]

    Electricity demand forecasting in future grid states: A digital twin-based simulation study,

    D. R. Bayer, F. Haag, M. Pruckner, and K. Hopf, “Electricity demand forecasting in future grid states: A digital twin-based simulation study,” in2024 9th International Conference on Smart and Sustainable Technologies (SpliTech), pp. 1–6, IEEE, 2024

  9. [9]

    Fundamentals of recurrent neural network (RNN) and long short- term memory (LSTM) network,

    A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short- term memory (LSTM) network,”Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020

  10. [10]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

  11. [11]

    Informer: Beyond efficient transformer for long sequence time-series forecasting,

    H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceed- ings of the AAAI conference on artificial intelligence, vol. 35, pp. 11106–11115, 2021. 18

  12. [12]

    xlstm: Extended long short-term memory,

    M. Beck, K. P¨ oppel, M. Spanring, A. Auer, O. Prudnikova, M. Kopp, G. Klambauer, J. Brandstetter, and S. Hochreiter, “xlstm: Extended long short-term memory,” Advances in Neural Information Processing Systems, vol. 37, pp. 107547–107603, 2024

  13. [13]

    Long range arena : A benchmark for efficient trans- formers,

    Y. Tay, M. Dehghani, S. Abnar, Y. Shen, D. Bahri, P. Pham, J. Rao, L. Yang, S. Ruder, and D. Metzler, “Long range arena : A benchmark for efficient trans- formers,” inInternational Conference on Learning Representations, 2021

  14. [14]

    Energy Forecasting: A Review and Outlook,

    T. Hong, P. Pinson, Y. Wang, R. Weron, D. Yang, and H. Zareipour, “Energy Forecasting: A Review and Outlook,”IEEE Open Access Journal of Power and Energy, vol. 7, pp. 376–388, 2020

  15. [15]

    Short-Term Forecasting of Heat Demand of Build- ings for Efficient and Optimal Energy Management Based on Integrated Machine Learning Models,

    A. T. Eseye and M. Lehtonen, “Short-Term Forecasting of Heat Demand of Build- ings for Efficient and Optimal Energy Management Based on Integrated Machine Learning Models,”IEEE Transactions on Industrial Informatics, vol. 16, pp. 7743– 7755, Dec. 2020

  16. [16]

    Machine-learning-based multi-step heat demand forecasting in a district heating system,

    P. Potoˇ cnik, P. ˇSkerl, and E. Govekar, “Machine-learning-based multi-step heat demand forecasting in a district heating system,”Energy and Buildings, vol. 233, p. 110673, Feb. 2021

  17. [17]

    Modeling and short-term prediction of HVAC system with a clustering algorithm,

    F. Tang, A. Kusiak, and X. Wei, “Modeling and short-term prediction of HVAC system with a clustering algorithm,”Energy and Buildings, vol. 82, pp. 310–321, Oct. 2014

  18. [18]

    Heat load prediction through recurrent neural network in district heating and cooling systems,

    K. Kato, M. Sakawa, K. Ishimaru, S. Ushiro, and T. Shibano, “Heat load prediction through recurrent neural network in district heating and cooling systems,” in2008 IEEE International Conference on Systems, Man and Cybernetics, pp. 1401–1406, Oct. 2008. ISSN: 1062-922X

  19. [19]

    Short-Term Load Forecasts Using LSTM Networks,

    S. Muzaffar and A. Afshari, “Short-Term Load Forecasts Using LSTM Networks,” Energy Procedia, vol. 158, pp. 2922–2927, Feb. 2019

  20. [20]

    Assessment of long short-term memory and its modifications for enhanced short-term building energy predictions,

    G. Li, X. Zhao, C. Fan, X. Fang, F. Li, and Y. Wu, “Assessment of long short-term memory and its modifications for enhanced short-term building energy predictions,” Journal of Building Engineering, vol. 43, p. 103182, Nov. 2021

  21. [21]

    Multi-step ahead fore- casting of heat load in district heating systems using machine learning algorithms,

    P. Xue, Y. Jiang, Z. Zhou, X. Chen, X. Fang, and J. Liu, “Multi-step ahead fore- casting of heat load in district heating systems using machine learning algorithms,” Energy, vol. 188, p. 116085, Dec. 2019

  22. [22]

    A transformer-based framework for multivariate time series representation learning,

    G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multivariate time series representation learning,” inProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, (New York, NY, USA), p. 2114–2124, Association for Com- puting Machinery, 2021. 19

  23. [23]

    Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting,

    H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting,” inAdvances in Neural Information Processing Systems, vol. 34, pp. 22419–22430, Curran Associates, Inc., 2021

  24. [24]

    Short-Term Load Forecasting Based on the Transformer Model,

    Z. Zhao, C. Xia, L. Chi, X. Chang, W. Li, T. Yang, and A. Y. Zomaya, “Short-Term Load Forecasting Based on the Transformer Model,”Information, vol. 12, p. 516, Dec. 2021. Number: 12 Publisher: Multidisciplinary Digital Publishing Institute

  25. [25]

    Short-Term Electrical Load Forecasting Based on Time Augmented Transformer,

    G. Zhang, C. Wei, C. Jing, and Y. Wang, “Short-Term Electrical Load Forecasting Based on Time Augmented Transformer,”International Journal of Computational Intelligence Systems, vol. 15, p. 67, Aug. 2022

  26. [26]

    Short-Term Electricity Load Forecasting Based on Temporal Fusion Transformer Model,

    P. C. Huy, N. Q. Minh, N. D. Tien, and T. T. Q. Anh, “Short-Term Electricity Load Forecasting Based on Temporal Fusion Transformer Model,”IEEE Access, vol. 10, pp. 106296–106304, 2022

  27. [27]

    Short-term electricity load forecasting using the temporal fusion transformer: Effect of grid hierarchies and data sources,

    E. Giacomazzi, F. Haag, and K. Hopf, “Short-term electricity load forecasting using the temporal fusion transformer: Effect of grid hierarchies and data sources,” in Proceedings of the 14th ACM International Conference on Future Energy Systems, pp. 353–360, 2023

  28. [28]

    xlstmtime: Long-term time series forecasting with xlstm,

    M. Alharthi and A. Mahmood, “xlstmtime: Long-term time series forecasting with xlstm,”AI, vol. 5, no. 3, pp. 1482–1495, 2024

  29. [29]

    xLSTM-mixer: Multivariate time series forecasting by mixing via scalar memories,

    M. Kraus, F. Divo, D. S. Dhami, and K. Kersting, “xLSTM-mixer: Multivariate time series forecasting by mixing via scalar memories,” 2024

  30. [30]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

  31. [31]

    Temporal fusion transformers for interpretable multi-horizon time series forecasting,

    B. Lim, S. O. Arık, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,”International Journal of Fore- casting, vol. 37, no. 4, pp. 1748–1764, 2021

  32. [32]

    Ai in district heating

    “Ai in district heating.” https://future-energy-lab.de/projects/ki-in-fernwaerme/. Accessed: 23.05.2025

  33. [33]

    Modeling and forecasting building energy consumption: A review of data-driven techniques,

    M. Bourdeau, X. Q. Zhai, E. Nefzaoui, X. Guo, and P. Chatellier, “Modeling and forecasting building energy consumption: A review of data-driven techniques,”Sus- tainable Cities and Society, vol. 48, p. 101533, July 2019

  34. [34]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

  35. [35]

    Understanding the difficulty of training deep feedfor- ward neural networks,

    X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor- ward neural networks,” inProceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256, JMLR Workshop and Conference Proceedings, 2010. 20

  36. [36]

    Operational demand forecasting in district heating systems using ensem- bles of online machine learning algorithms,

    C. Johansson, M. Bergkvist, D. Geysen, O. D. Somer, N. Lavesson, and D. Van- houdt, “Operational demand forecasting in district heating systems using ensem- bles of online machine learning algorithms,”Energy Procedia, vol. 116, pp. 208–216,

  37. [37]

    15th International Symposium on District Heating and Cooling, DHC15-2016, 4-7 September 2016, Seoul, South Korea. 21