Recognition: no theorem link
Benchmarking Transformer and xLSTM for Time-Series Forecasting of Heat Consumption
Pith reviewed 2026-05-12 03:57 UTC · model grok-4.3
The pith
xLSTM delivers the lowest RMSE for building heat demand forecasts while simpler networks match accuracy at far lower computational cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On pooled hourly data from 25 heterogeneous buildings, the xLSTM architecture records the lowest root mean square errors of 19.88 kWh for three-hour forecasts and 21.47 kWh for 24-hour forecasts, while the Temporal Fusion Transformer attains the lowest mean absolute error of 9.16 kWh at the three-hour horizon. These models are trained to generalize across diverse building stocks rather than on single structures. The work also measures training duration and trainable parameter counts, revealing that the advanced architectures require substantially more resources than a traditional fully connected network, which still delivers good predictive performance for the same forecasting tasks.
What carries the argument
The pooled multi-building training benchmark that evaluates cross-building generalization together with paired measurement of forecast error against training time and parameter count.
If this is right
- xLSTM records the lowest RMSE for both three-hour and 24-hour forecast horizons.
- The Temporal Fusion Transformer records the lowest MAE for three-hour forecasts.
- A traditional fully connected network achieves competitive accuracy with far fewer parameters and shorter training times.
- The marginal accuracy gains of the advanced models come at substantial increases in computational resource demand.
- Sustainability concerns arise for xLSTM and Transformer models in practical district heating applications because of their training costs.
Where Pith is reading between the lines
- The pattern of diminishing returns from complex sequence models may appear in other forecasting tasks that include strong exogenous drivers such as temperature.
- Scaling the benchmark to additional buildings or longer time spans would test whether the observed resource-accuracy trade-off persists.
- If simpler models generalize reliably, operators of district heating networks could reduce the energy footprint of maintaining forecast systems.
Load-bearing premise
Training on pooled data from 25 heterogeneous buildings produces models that generalize fairly across the building stock without hidden data leakage or unequal hyperparameter tuning across architectures.
What would settle it
Retraining every architecture on the same 25-building pool with an identical hyperparameter search budget and then measuring whether the fully connected network's RMSE on a new test building stays within 10 percent of the xLSTM value.
Figures
read the original abstract
Obtaining an accurate short-term forecasting for heat demand is an essential part of operating district heating networks cost-efficient and reliable. Heat consumption time series at the building level are highly dependent on exogenous variables such as outdoor temperature and individual usage patterns, making forecasting in this context a challenging task. Thus, this paper benchmarks novel Transformer-based and xLSTM architectures for short-term heat-demand forecasting. Using hourly data from 25 German buildings (2017-2025), we compare three-hour and 24-hour forecasting horizons relevant for intraday control and day-ahead scheduling. We establish a multi-building benchmark that tests whether models trained on pooled, heterogeneous building data are able to generalize across diverse building stock. The results show that the xLSTM achieves the lowest RMSE (19.88 kWh for three-hour, 21.47 kWh for 24-hour forecasts), while the Temporal Fusion Transformer attains the best MAE (9.16 kWh for three-hour forecasts). As xLSTMs and Transformers require long training times and have a huge number of trainable parameters, their sustainability remains questionable. Therefore, this paper further investigates the trade-off between predictive accuracy and computational resource demand of the evaluated forecasting models. The findings indicate that also low-parameter models like a traditional fully-connected network achieve good predictive results, highlighting that marginal accuracy gains of the novel prediction models come at substantial resource expense for this use case.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper benchmarks xLSTM, Temporal Fusion Transformer (TFT), other Transformer variants, and a fully-connected (FC) baseline for short-term heat-demand forecasting. Using pooled hourly data from 25 German buildings (2017-2025), it evaluates 3-hour and 24-hour horizons and reports that xLSTM attains the lowest RMSE (19.88 kWh for 3 h, 21.47 kWh for 24 h) while TFT attains the best MAE (9.16 kWh for 3 h); it further claims that low-parameter FC networks achieve competitive accuracy at far lower computational cost, questioning the sustainability of high-parameter models.
Significance. If the experimental protocol is shown to be free of leakage and to apply equal tuning effort, the work would supply a useful multi-building benchmark for district-heating forecasting and would usefully quantify the accuracy–compute trade-off for this domain. The concrete RMSE/MAE numbers on held-out periods and the explicit comparison with a simple FC baseline are strengths that would support practical deployment decisions.
major comments (2)
- [Abstract and §3] Abstract and §3 (Experimental Setup): the headline performance claims rest on models trained on pooled data from 25 heterogeneous buildings, yet no description is given of the temporal train/validation/test split, the forecasting-horizon-aware partitioning, or any safeguards against cross-building or future-data leakage. Without these details the reported margins (xLSTM RMSE 19.88/21.47 kWh, TFT MAE 9.16 kWh) cannot be verified as architectural rather than artifactual.
- [§4 and §3.2] §4 (Results) and §3.2 (Hyperparameter protocol): no information is supplied on the hyperparameter search budget, search space, number of trials, or early-stopping rule applied uniformly to xLSTM, TFT, and the FC baseline. Because the central claim is that xLSTM and TFT are superior (or that the margin is small), unequal optimization effort is a load-bearing threat to the ranking.
minor comments (2)
- [Table 1] Table 1 (or equivalent results table): column headers for RMSE/MAE should explicitly state the forecast horizon and the unit (kWh) to avoid ambiguity when readers compare 3 h vs. 24 h rows.
- [Conclusion] The sustainability discussion in the abstract and conclusion would benefit from a single additional column or row reporting wall-clock training time or parameter count for each model on the same hardware.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We will revise the manuscript to supply the missing details on data partitioning and hyperparameter tuning, thereby improving transparency and allowing readers to verify that the reported performance differences are not artifacts.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Experimental Setup): the headline performance claims rest on models trained on pooled data from 25 heterogeneous buildings, yet no description is given of the temporal train/validation/test split, the forecasting-horizon-aware partitioning, or any safeguards against cross-building or future-data leakage. Without these details the reported margins (xLSTM RMSE 19.88/21.47 kWh, TFT MAE 9.16 kWh) cannot be verified as architectural rather than artifactual.
Authors: We agree that a clear description of the data partitioning is essential to confirm the absence of leakage. In the revised manuscript we will expand Section 3 with a dedicated subsection that specifies the exact temporal train/validation/test split, the forecasting-horizon-aware windowing procedure, and the explicit safeguards implemented against cross-building and future-data leakage. revision: yes
-
Referee: [§4 and §3.2] §4 (Results) and §3.2 (Hyperparameter protocol): no information is supplied on the hyperparameter search budget, search space, number of trials, or early-stopping rule applied uniformly to xLSTM, TFT, and the FC baseline. Because the central claim is that xLSTM and TFT are superior (or that the margin is small), unequal optimization effort is a load-bearing threat to the ranking.
Authors: We acknowledge that the hyperparameter optimization protocol must be documented in detail to support the fairness of the comparisons. In the revision we will augment Section 3.2 with the search budget, search spaces, number of trials, optimization method, and early-stopping rule, confirming that the same protocol was applied to xLSTM, TFT, and the FC baseline. revision: yes
Circularity Check
No circularity: purely empirical benchmarking on external data
full rationale
The paper is an empirical benchmarking study that trains and evaluates forecasting models (xLSTM, TFT, FC) on hourly heat consumption data from 25 buildings and reports test-set metrics (RMSE, MAE) for 3h and 24h horizons. No derivation chain, first-principles equations, or fitted parameters are presented that could reduce to the inputs by construction. All claims rest on standard train/test splits and performance numbers computed on held-out data, making the work self-contained and externally falsifiable.
Axiom & Free-Parameter Ledger
free parameters (1)
- model-specific hyperparameters
axioms (1)
- domain assumption Hourly heat-consumption series can be adequately modeled by recurrent and attention-based architectures without additional domain-specific feature engineering.
Reference graph
Works this paper leans on
-
[1]
German Federal Ministry for Economic Cooperation and Development, “Energy and climate.” https://www.bmz.de/en/issues/climate-change-and-development/energy- and-climate, 2025
work page 2025
-
[2]
Why heat is a challenge in the fight against climate change, and what we can do about it,
T. Pan, “Why heat is a challenge in the fight against climate change, and what we can do about it,” 2023
work page 2023
-
[3]
K. Djebko, D. Weidner, M. Waleska, T. Krey, B. Kamble, S. Rausch, D. Seipel, and F. Puppe, “Design and implementation of a decision integration system for monitoring and optimizing heating systems: Results and lessons learned,”Energies, vol. 17, no. 24, p. 6290, 2024
work page 2024
-
[4]
An operational strategy for district heating networks: application of energetic flexibility,
A. Vandermeulen, B. van der Heijde, D. Patteeuw, D. Vanhoudt, and L. Helsen, “An operational strategy for district heating networks: application of energetic flexibility,”Energy Informatics, vol. 3, no. 1, pp. 1–17, 2020
work page 2020
-
[5]
Cooling, heating and electric load forecasting for integrated energy system based on lstm-cnn,
Y. Jiang, Q. Wan, X. Yang, Y. Xu, and Y. Wang, “Cooling, heating and electric load forecasting for integrated energy system based on lstm-cnn,” in2021 IEEE 4th International Electrical and Energy Conference (CIEEC), pp. 1–6, IEEE, 2021
work page 2021
-
[6]
Y. Feng, “Optimizing energy efficiency: predicting heating load with a machine learning approach and meta-heuristic algorithms,”Multiscale and Multidisciplinary Modeling, Experiments and Design, vol. 7, pp. 3993–4009, 2024
work page 2024
-
[7]
Time-series power forecasting for wind and solar energy based on the sl-transformer,
J. Zhu, Z. Zhao, X. Zheng, Z. An, Q. Guo, Z. Li, J. Sun, and Y. Guo, “Time-series power forecasting for wind and solar energy based on the sl-transformer,”Energies, vol. 16, no. 22, p. 7610, 2023
work page 2023
-
[8]
Electricity demand forecasting in future grid states: A digital twin-based simulation study,
D. R. Bayer, F. Haag, M. Pruckner, and K. Hopf, “Electricity demand forecasting in future grid states: A digital twin-based simulation study,” in2024 9th International Conference on Smart and Sustainable Technologies (SpliTech), pp. 1–6, IEEE, 2024
work page 2024
-
[9]
Fundamentals of recurrent neural network (RNN) and long short- term memory (LSTM) network,
A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short- term memory (LSTM) network,”Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020
work page 2020
-
[10]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[11]
Informer: Beyond efficient transformer for long sequence time-series forecasting,
H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceed- ings of the AAAI conference on artificial intelligence, vol. 35, pp. 11106–11115, 2021. 18
work page 2021
-
[12]
xlstm: Extended long short-term memory,
M. Beck, K. P¨ oppel, M. Spanring, A. Auer, O. Prudnikova, M. Kopp, G. Klambauer, J. Brandstetter, and S. Hochreiter, “xlstm: Extended long short-term memory,” Advances in Neural Information Processing Systems, vol. 37, pp. 107547–107603, 2024
work page 2024
-
[13]
Long range arena : A benchmark for efficient trans- formers,
Y. Tay, M. Dehghani, S. Abnar, Y. Shen, D. Bahri, P. Pham, J. Rao, L. Yang, S. Ruder, and D. Metzler, “Long range arena : A benchmark for efficient trans- formers,” inInternational Conference on Learning Representations, 2021
work page 2021
-
[14]
Energy Forecasting: A Review and Outlook,
T. Hong, P. Pinson, Y. Wang, R. Weron, D. Yang, and H. Zareipour, “Energy Forecasting: A Review and Outlook,”IEEE Open Access Journal of Power and Energy, vol. 7, pp. 376–388, 2020
work page 2020
-
[15]
A. T. Eseye and M. Lehtonen, “Short-Term Forecasting of Heat Demand of Build- ings for Efficient and Optimal Energy Management Based on Integrated Machine Learning Models,”IEEE Transactions on Industrial Informatics, vol. 16, pp. 7743– 7755, Dec. 2020
work page 2020
-
[16]
Machine-learning-based multi-step heat demand forecasting in a district heating system,
P. Potoˇ cnik, P. ˇSkerl, and E. Govekar, “Machine-learning-based multi-step heat demand forecasting in a district heating system,”Energy and Buildings, vol. 233, p. 110673, Feb. 2021
work page 2021
-
[17]
Modeling and short-term prediction of HVAC system with a clustering algorithm,
F. Tang, A. Kusiak, and X. Wei, “Modeling and short-term prediction of HVAC system with a clustering algorithm,”Energy and Buildings, vol. 82, pp. 310–321, Oct. 2014
work page 2014
-
[18]
Heat load prediction through recurrent neural network in district heating and cooling systems,
K. Kato, M. Sakawa, K. Ishimaru, S. Ushiro, and T. Shibano, “Heat load prediction through recurrent neural network in district heating and cooling systems,” in2008 IEEE International Conference on Systems, Man and Cybernetics, pp. 1401–1406, Oct. 2008. ISSN: 1062-922X
work page 2008
-
[19]
Short-Term Load Forecasts Using LSTM Networks,
S. Muzaffar and A. Afshari, “Short-Term Load Forecasts Using LSTM Networks,” Energy Procedia, vol. 158, pp. 2922–2927, Feb. 2019
work page 2019
-
[20]
G. Li, X. Zhao, C. Fan, X. Fang, F. Li, and Y. Wu, “Assessment of long short-term memory and its modifications for enhanced short-term building energy predictions,” Journal of Building Engineering, vol. 43, p. 103182, Nov. 2021
work page 2021
-
[21]
P. Xue, Y. Jiang, Z. Zhou, X. Chen, X. Fang, and J. Liu, “Multi-step ahead fore- casting of heat load in district heating systems using machine learning algorithms,” Energy, vol. 188, p. 116085, Dec. 2019
work page 2019
-
[22]
A transformer-based framework for multivariate time series representation learning,
G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multivariate time series representation learning,” inProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, (New York, NY, USA), p. 2114–2124, Association for Com- puting Machinery, 2021. 19
work page 2021
-
[23]
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting,
H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting,” inAdvances in Neural Information Processing Systems, vol. 34, pp. 22419–22430, Curran Associates, Inc., 2021
work page 2021
-
[24]
Short-Term Load Forecasting Based on the Transformer Model,
Z. Zhao, C. Xia, L. Chi, X. Chang, W. Li, T. Yang, and A. Y. Zomaya, “Short-Term Load Forecasting Based on the Transformer Model,”Information, vol. 12, p. 516, Dec. 2021. Number: 12 Publisher: Multidisciplinary Digital Publishing Institute
work page 2021
-
[25]
Short-Term Electrical Load Forecasting Based on Time Augmented Transformer,
G. Zhang, C. Wei, C. Jing, and Y. Wang, “Short-Term Electrical Load Forecasting Based on Time Augmented Transformer,”International Journal of Computational Intelligence Systems, vol. 15, p. 67, Aug. 2022
work page 2022
-
[26]
Short-Term Electricity Load Forecasting Based on Temporal Fusion Transformer Model,
P. C. Huy, N. Q. Minh, N. D. Tien, and T. T. Q. Anh, “Short-Term Electricity Load Forecasting Based on Temporal Fusion Transformer Model,”IEEE Access, vol. 10, pp. 106296–106304, 2022
work page 2022
-
[27]
E. Giacomazzi, F. Haag, and K. Hopf, “Short-term electricity load forecasting using the temporal fusion transformer: Effect of grid hierarchies and data sources,” in Proceedings of the 14th ACM International Conference on Future Energy Systems, pp. 353–360, 2023
work page 2023
-
[28]
xlstmtime: Long-term time series forecasting with xlstm,
M. Alharthi and A. Mahmood, “xlstmtime: Long-term time series forecasting with xlstm,”AI, vol. 5, no. 3, pp. 1482–1495, 2024
work page 2024
-
[29]
xLSTM-mixer: Multivariate time series forecasting by mixing via scalar memories,
M. Kraus, F. Divo, D. S. Dhami, and K. Kersting, “xLSTM-mixer: Multivariate time series forecasting by mixing via scalar memories,” 2024
work page 2024
-
[30]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997
work page 1997
-
[31]
Temporal fusion transformers for interpretable multi-horizon time series forecasting,
B. Lim, S. O. Arık, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,”International Journal of Fore- casting, vol. 37, no. 4, pp. 1748–1764, 2021
work page 2021
-
[32]
“Ai in district heating.” https://future-energy-lab.de/projects/ki-in-fernwaerme/. Accessed: 23.05.2025
work page 2025
-
[33]
Modeling and forecasting building energy consumption: A review of data-driven techniques,
M. Bourdeau, X. Q. Zhai, E. Nefzaoui, X. Guo, and P. Chatellier, “Modeling and forecasting building energy consumption: A review of data-driven techniques,”Sus- tainable Cities and Society, vol. 48, p. 101533, July 2019
work page 2019
-
[34]
Adam: A Method for Stochastic Optimization
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[35]
Understanding the difficulty of training deep feedfor- ward neural networks,
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor- ward neural networks,” inProceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256, JMLR Workshop and Conference Proceedings, 2010. 20
work page 2010
-
[36]
C. Johansson, M. Bergkvist, D. Geysen, O. D. Somer, N. Lavesson, and D. Van- houdt, “Operational demand forecasting in district heating systems using ensem- bles of online machine learning algorithms,”Energy Procedia, vol. 116, pp. 208–216,
-
[37]
15th International Symposium on District Heating and Cooling, DHC15-2016, 4-7 September 2016, Seoul, South Korea. 21
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.