pith. machine review for the scientific record. sign in

arxiv: 2604.16443 · v1 · submitted 2026-04-07 · 📡 eess.SY · cs.LG· cs.SY

Recognition: 2 theorem links

· Lean Theorem

Thermal-GEMs: Generalized Models for Building Thermal Dynamics

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:11 UTC · model grok-4.3

classification 📡 eess.SY cs.LGcs.SY
keywords transfer learningbuilding thermal dynamicsmulti-source modelstime series forecastingfoundation modelsenergy modelingforecast error reduction
0
0 comments X

The pith

Multi-source transfer learning models reduce building thermal forecasting errors by up to 63% versus single-source methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper evaluates ways to build accurate data-driven models for how buildings gain and lose heat, a key step toward energy-efficient operation without needing long measurement records from every new site. It tests multi-source transfer learning, where models are pretrained on thermal data from several buildings at once, against both single-source transfer learning and time series foundation models trained on many kinds of data. Ablation studies with four established multi-source architectures and tests on synthetic plus real-world datasets show clear gains from the multi-source approach. The work also identifies a practical threshold: pretraining on data from 16 to 32 buildings collected over one year allows these specialized models to beat the foundation models on mean absolute error. The results give concrete advice on which modeling route to pick based on how much source data is on hand.

Core claim

The authors demonstrate that multi-source TL models pretrained on multiple source buildings deliver up to 63% lower forecasting errors than single-source TL when applied to real buildings. They also identify a data-volume trade-off: multi-source TL models require thermal data from 16-32 source buildings spanning one year to consistently outperform TSFMs pretrained on diverse time series, when performance is measured by mean absolute error. These outcomes supply guidance for choosing between building-focused pretraining and general foundation models according to the number of available source buildings.

What carries the argument

Four state-of-the-art multi-source transfer learning architectures pretrained exclusively on building thermal time series, evaluated against time series foundation models through ablations on synthetic and real-world datasets.

If this is right

  • Multi-source TL models can be deployed directly for accurate real-world building thermal forecasting.
  • When data from 16-32 source buildings over one year is available, multi-source TL should be preferred over TSFMs for lower mean absolute error.
  • Single-source TL is consistently outperformed, confirming the value of pretraining across multiple buildings.
  • Modeling strategy selection for new buildings can be guided by counting how many source buildings' data sets are accessible for pretraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Organizations could pool thermal data across many buildings to create reusable models that lower the data-collection burden for each new site.
  • Hybrid pretraining that mixes building-specific data with broader time series might close remaining performance gaps without needing the full 16-32 building threshold.
  • The identified data threshold may indicate scaling behavior that applies to other specialized time-series domains facing competition from general foundation models.
  • Future evaluations on larger or more diverse building stocks could refine the exact number of source buildings required.

Load-bearing premise

The four chosen multi-source TL architectures and the particular synthetic and real-world datasets tested are representative enough that the observed error reductions and 16-32 building threshold apply to other buildings and operating conditions.

What would settle it

Applying the same models to a fresh collection of buildings with different construction types or climates and finding error reductions well below 63% or no consistent advantage for multi-source TL until far more than 32 buildings are included would falsify the central claims.

Figures

Figures reproduced from arXiv: 2604.16443 by Benjamin Tischler, Fabian Raisch, Felix Koch.

Figure 1
Figure 1. Figure 1: Overview of modeling strategies regarding our re [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Seasonal Fine-Tuning split for the experiments [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Zero-Shot MAE of all architectures as MSGMs [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: MAE of Single-Source TL versus MSGM for different fine-tunings on real-world data (Ecobee). MSGM beats single [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Zero-shot performance: Per-building TSFM com [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Zero-shot performance: Influence of the number of sources on the MSGM for buildings. The best-performing TSFM [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: RMSE of Single-Source TL versus MSGM for different fine-tunings on real-world data (Ecobee). MSGM beats single [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Zero-shot performance RMSE: Influence of the number of sources on the MSGM for buildings. The best-performing [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

Data-driven models for building thermal dynamics are a scalable approach for enabling energy-efficient operation through fault detection & diagnosis or advanced control. To obtain accurate models, measurement data from a target building spanning months to years are required. Transfer Learning (TL) mitigates this challenge by employing pretrained models based on single or multiple source buildings. General multi-source TL models promise to outperform single-source TL, but alternative multi-source modeling architectures remain to be explored, and evaluation on real-world data is missing. Moreover, time series foundation models (TSFM) have emerged as candidates for the best-performing general models. Hence, we conduct a first, comprehensive assessment of general modeling approaches for building thermal dynamics, including multi-source TL and TSFMs. Our assessment includes ablations using four state-of-the-art multi-source TL architectures and evaluations on synthetic as well as real-world data. We demonstrate that multi-source TL models are highly effective in accurately modeling buildings in real-world applications, yielding up to 63% lower forecasting errors compared to single-source TL. Moreover, our results suggest a trade-off between multi-source TL models exclusively pretrained with building data and TSFMs pretrained with a multitude of different time series, revealing that data from 16-32 source buildings must be available over 1 year for pretraining multi-source TL models to consistently outperform TSFMs as evaluated using the mean absolute error. These findings provide practical guidance for selecting modeling strategies based on the number of source buildings available for pretraining multi-source TL models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript presents a comprehensive empirical assessment of generalized models for building thermal dynamics, comparing multi-source transfer learning (TL) approaches against time series foundation models (TSFMs). It evaluates four state-of-the-art multi-source TL architectures through ablations on synthetic and real-world datasets, claiming up to 63% lower forecasting errors relative to single-source TL and identifying a practical threshold of 16-32 source buildings over 1 year for multi-source TL to consistently outperform TSFMs when measured by mean absolute error.

Significance. If the empirical findings hold under broader conditions, the work provides actionable guidance for practitioners selecting modeling strategies in building energy systems when target data is scarce. A clear strength is the inclusion of ablations across multiple architectures and dual evaluation on synthetic plus real-world data, yielding falsifiable performance numbers from held-out tests rather than self-referential fits. This directly addresses data requirements for accurate thermal dynamics modeling in fault detection and control applications.

major comments (1)
  1. [§5] §5 (multi-source TL vs. TSFM comparison): The central trade-off claim that data from 16-32 source buildings over 1 year is required for multi-source TL to consistently outperform TSFMs rests on evaluations with four specific architectures and one synthetic generator plus a real-world corpus. The manuscript should add sensitivity analyses to source-building sampling strategies and synthetic data fidelity (e.g., effects of occupancy stochasticity or sensor noise) because shifts in these factors could move the MAE crossover point by more than a factor of two, undermining the reported threshold as a general guideline.
minor comments (3)
  1. [Abstract] Abstract: The stated quantitative results (63% error reduction, 16-32 building threshold) are given without any reference to model architectures, training details, error bars, or data characteristics, making it difficult for readers to assess support for the claims at first reading.
  2. [Methods] Methods section: Clarify how the four chosen multi-source TL architectures were selected and whether they exhaustively represent the space of general multi-source models, as the weakest assumption in the evaluation is their representativeness.
  3. [Results] Results tables/figures: Include statistical significance tests or confidence intervals alongside the MAE values to support statements of 'consistent' outperformance across the 16-32 building range.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We have carefully considered the major comment and provide our response below. We believe the manuscript can be strengthened by addressing the points raised.

read point-by-point responses
  1. Referee: [§5] §5 (multi-source TL vs. TSFM comparison): The central trade-off claim that data from 16-32 source buildings over 1 year is required for multi-source TL to consistently outperform TSFMs rests on evaluations with four specific architectures and one synthetic generator plus a real-world corpus. The manuscript should add sensitivity analyses to source-building sampling strategies and synthetic data fidelity (e.g., effects of occupancy stochasticity or sensor noise) because shifts in these factors could move the MAE crossover point by more than a factor of two, undermining the reported threshold as a general guideline.

    Authors: We appreciate the referee's concern regarding the robustness of our reported threshold. Our study evaluates the performance across four state-of-the-art multi-source TL architectures using both a synthetic data generator and a real-world dataset from multiple buildings. The real-world data naturally incorporates stochastic occupancy patterns and sensor noise, providing a realistic testbed. The synthetic data is used to control variables and isolate effects. We do not present the 16-32 building threshold as an absolute general guideline but as a practical observation from our experiments, as stated in the abstract ('our results suggest'). To address the comment, we will revise the manuscript to include an expanded discussion on the potential sensitivities to sampling strategies and data fidelity, including qualitative analysis of how variations might affect the crossover point. However, conducting exhaustive quantitative sensitivity analyses would require generating new datasets and retraining models, which is computationally intensive and beyond the current scope; we note this as a limitation and direction for future work. This partial revision maintains the integrity of our empirical findings while acknowledging the referee's valid point. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical held-out evaluation on independent data

full rationale

The paper conducts ablations and evaluations of multi-source TL architectures and TSFMs on synthetic and real-world building data, reporting forecasting errors (MAE) on held-out test sets. No derivation, equation, or claim reduces by construction to fitted parameters, self-citations, or renamed inputs; the 16-32 building threshold and 63% error reduction are direct outputs of the described experiments rather than tautological restatements of the training procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical machine-learning evaluation study. No mathematical axioms, free parameters, or invented physical entities are introduced or required by the central claims in the abstract.

pith-pipeline@v0.9.0 · 5575 in / 1181 out tokens · 35297 ms · 2026-05-10T19:11:16.810951+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 13 canonical work pages · 5 internal anchors

  1. [1]

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization frame- work. InProceedings of the 25th ACM SIGKDD international conference on knowl- edge discovery & data mining. 2623–2631

  2. [2]

    Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, et al. 2025. Chronos-2: From Univariate to Universal Forecasting.arXiv preprint arXiv:2510.15821(2025)

  3. [3]

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebas- tian Pineda Arango, Shubham Kapoor, et al. 2024. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815(2024)

  4. [4]

    Javier Arroyo, Carlo Manna, Fred Spiessens, and Lieve Helsen. 2022. Reinforced model predictive control (RL-MPC) for building energy management.Applied Energy309 (2022), 118346

  5. [5]

    Yasaman Balali, Adrian Chong, Andrew Busch, and Steven O’Keefe. 2023. Energy modelling and control of building heating and cooling systems with data-driven and hybrid models—A review.Renewable and Sustainable Energy Reviews183 (2023), 113496

  6. [6]

    Michael Batty. 2018. Digital twins. 817–820 pages

  7. [7]

    Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. 2024. xlstm: Extended long short-term memory.Advances in Neural Information Processing Systems37 (2024), 107547–107603

  8. [8]

    Gaurav Chaudhary, Hicham Johra, Laurent Georges, and Bjørn Austbø. 2025. Transfer learning in building dynamics prediction.Energy and Buildings330 (2025), 115384

  9. [9]

    Yujiao Chen, Zheming Tong, Yang Zheng, Holly Samuelson, and Leslie Norford

  10. [10]

    Transfer learning with deep neural networks for model predictive control of HVAC and natural ventilation in smart buildings.Journal of Cleaner Production 254 (2020), 119866

  11. [11]

    Zhelun Chen, Zheng O’Neill, Jin Wen, Ojas Pradhan, Tao Yang, Xing Lu, Guanjing Lin, Shohei Miyata, Seungjae Lee, Chou Shen, Roberto Chiosa, Marco Savino Piscitelli, Alfonso Capozzoli, Franz Hengel, Alexander Kührer, Marco Pritoni, Wei Liu, John Clauß, Yimin Chen, and Terry Herr. 2023. A review of data-driven BUILDSYS ’26, June 22–25, 2026, Banff, Canada K...

  12. [12]

    Wonjun Choi and Sangwon Lee. 2023. Performance evaluation of deep learning architectures for load and temperature forecasting under dataset size constraints and seasonality.Energy and buildings288 (2023), 113027

  13. [13]

    Ben Cohen, Emaad Khwaja, Youssef Doubli, Salahidine Lemaachi, Chris Lettieri, Charles Masson, Hugo Miccinilli, Elise Ramé, Qiqi Ren, Afshin Rostamizadeh, et al. 2025. This Time is Different: An Observability Perspective on Time Series Foundation Models.arXiv preprint arXiv:2505.14766(2025)

  14. [14]

    Copernicus Climate Change Service (C3S). 2017. ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate. Climate Data Store (CDS). Access via cds.climate.copernicus.eu

  15. [15]

    Davide Coraci, Silvio Brandi, and Alfonso Capozzoli. 2023. Effective pre-training of a deep reinforcement learning agent by means of long short-term memory models for thermal energy management in buildings.Energy Conversion and Management291 (2023), 117303. doi:10.1016/j.enconman.2023.117303

  16. [16]

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. 2024. A decoder- only foundation model for time-series forecasting. InForty-first International Conference on Machine Learning

  17. [17]

    Hongwen Dou and Kun Zhang. 2025. Transfer learning for cross-building fore- casting of building energy and indoor air temperature in model predictive control applications.Journal of Building Engineering(2025), 113341

  18. [18]

    Ján Drgoňa, Javier Arroyo, Iago Cupeiro Figueroa, David Blum, Krzysztof Arendt, Donghun Kim, Enric Perarnau Ollé, Juraj Oravec, Michael Wetter, Draguna L Vrabie, et al. 2020. All you need to know about model predictive control for buildings.Annual reviews in control50 (2020), 190–232

  19. [19]

    Zhimin Du, Xinqiao Jin, and Yunyu Yang. 2009. Fault diagnosis for temperature, flow rate and pressure sensors in VAV systems using wavelet neural network. Applied Energy86, 9 (2009), 1624–1631. doi:10.1016/j.apenergy.2009.01.015

  20. [20]

    Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Sumanta Mukherjee, Nam Nguyen, Wesley M Gifford, Chandra Reddy, and Jayant Kalagnanam. 2024. Tiny time mixers (ttms): Fast pre-trained models for enhanced zero/few-shot fore- casting of multivariate time series.Advances in Neural Information Processing Systems37 (2024), 74147–74181

  21. [21]

    Furkan Elmaz, Reinout Eyckerman, Wim Casteels, Steven Latré, and Peter Hellinckx. 2021. CNN-LSTM architecture for predictive indoor temperature modeling.Building and Environment206 (2021), 108327

  22. [22]

    Google Research. 2025. TimesFM: Time Series Foundation Model. https://github. com/google-research/timesfm. Accessed: 2025-12-09

  23. [23]

    Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. 2024. MOMENT: A Family of Open Time-series Foundation Models. InInternational Conference on Machine Learning

  24. [24]

    Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752(2023)

  25. [25]

    Matthias Hertel, Maximilian Beichter, Benedikt Heidrich, Oliver Neumann, Ben- jamin Schäfer, Ralf Mikut, and Veit Hagenmeyer. 2023. Transformer training strategies for forecasting multiple load time series.Energy Informatics6, Suppl 1 (2023), 20

  26. [26]

    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation9, 8 (1997), 1735–1780

  27. [27]

    Diederik P Kingma. 2014. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)

  28. [28]

    Thomas Krug, Fabian Raisch, Dominik Aimer, Markus Wirnsberger, Ferdinand Sigg, Felix Koch, Benjamin Schäfer, and Benjamin Tischler. 2025. A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research.arXiv preprint arXiv:2512.00483(2025)

  29. [29]

    Thomas Krug, Fabian Raisch, Dominik Aimer, Markus Wirnsberger, Ferdinand Sigg, Benjamin Schäfer, and Benjamin Tischler. 2025. Builda: A thermal building data generation framework for transfer learning. In2025 Annual Modeling and Simulation Conference (ANNSIM). IEEE, 1–13

  30. [30]

    Doyun Lee, Ryozo Ooka, Yuki Matsuda, Shintaro Ikeda, and Wonjun Choi. 2022. Experimental analysis of artificial intelligence-based model predictive control for thermal energy storage under different cooling load conditions.Sustainable Cities and Society79 (2022), 103700

  31. [31]

    Han Li, Giuseppe Pinto, Marco Savino Piscitelli, Alfonso Capozzoli, and Tianzhen Hong. 2024. Building thermal dynamics modeling with deep transfer learning using a large residential smart thermostat dataset.Engineering Applications of Artificial Intelligence130 (2024), 107701

  32. [32]

    Rui Liang, Yang Deng, Donghua Xie, Fang He, and Dan Wang. 2024. Enabling Time-series Foundation Model for Building Energy Forecasting via Contrastive Curriculum Learning.arXiv preprint arXiv:2412.17285(2024)

  33. [33]

    Bryan Lim, Sercan Ö Arık, Nicolas Loeff, and Tomas Pfister. 2021. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International journal of forecasting37, 4 (2021), 1748–1764

  34. [34]

    Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101(2017)

  35. [35]

    2022.Ecobee donate your data 1,000 homes in 2017

    Na Luo and Tianzhen Hong. 2022.Ecobee donate your data 1,000 homes in 2017. Technical Report. Pacific Northwest National Lab.(PNNL), Richland, WA (United States)

  36. [36]

    Ozan Baris Mulayim, Pengrui Quan, Liying Han, Xiaomin Ouyang, Dezhi Hong, Mario Bergés, and Mani Srivastava. 2024. Are Time Series Foundation Models Ready to Revolutionize Predictive Building Analytics?. InProceedings of the 11th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation. 169–173

  37. [37]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations

  38. [38]

    Young-Jin Park, François Germain, Jing Liu, Ye Wang, Toshiaki Koike-Akino, Gor- don Wichern, Navid Azizan, Christopher Laughman, and Ankush Chakrabarty

  39. [39]

    Probabilistic Forecasting for Building Energy Systems using Time-Series Foundation Models.Energy and Buildings(2025), 116446

  40. [40]

    Giuseppe Pinto, Riccardo Messina, Han Li, Tianzhen Hong, Marco Savino Piscitelli, and Alfonso Capozzoli. 2022. Sharing is caring: An extensive analy- sis of parameter-based transfer learning for the prediction of building thermal dynamics.Energy and Buildings276 (2022), 112530

  41. [41]

    Giuseppe Pinto, Zhe Wang, Abhishek Roy, Tianzhen Hong, and Alfonso Capozzoli

  42. [42]

    Transfer learning for smart buildings: A critical review of algorithms, applications, and future perspectives.Advances in Applied Energy5 (2022), 100084

  43. [43]

    Martin Pullinger, Jonathan Kilgour, Nigel Goddard, Niklas Berliner, Lynda Webb, Myroslava Dzikovska, Heather Lovell, Janek Mann, Charles Sutton, Janette Webb, et al. 2021. The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes.Scientific Data8, 1 (2021), 146

  44. [44]

    Fabian Raisch, Thomas Krug, Christoph Goebel, and Benjamin Tischler. 2025. GenTL: A General Transfer Learning Model for Building Thermal Dynamics

  45. [45]

    Fabian Raisch, Max Langtry, Felix Koch, Ruchi Choudhary, Christoph Goebel, and Benjamin Tischler. 2025. Adapting to Change: A Comparison of Continual and Transfer Learning for Modeling Building Thermal Dynamics under Concept Drifts.Energy and Buildings(2025), 116868

  46. [46]

    Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhag- watkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopou- los, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, and Irina Rish. 2024. Lag-Llama: Towards Foundati...

  47. [47]

    Skipper Seabold, Josef Perktold, et al. 2010. Statsmodels: econometric and statis- tical modeling with python.SciPy7, 1 (2010), 92–96

  48. [48]

    Wei Tian, Yeonsook Heo, Pieter De Wilde, Zhanyong Li, Da Yan, Cheol Soo Park, Xiaohang Feng, and Godfried Augenbroe. 2018. A review of uncertainty analysis in building energy assessment.Renewable and Sustainable Energy Reviews93 (2018)

  49. [49]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

  50. [50]

    Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Dal- ing Wang, and Yifei Zhang. 2025. Is mamba effective for time series forecasting? Neurocomputing619 (2025), 129178

  51. [51]

    Weather Underground. 2023. Daily Weather History for UK. https://www. wunderground.com/history. Accessed: 2025-05-12

  52. [52]

    Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. 2024. Unified Training of Universal Time Series Forecasting Transformers.arXiv preprint arXiv:2402.02592(2024)

  53. [53]

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: De- composition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems34 (2021), 22419–22430

  54. [54]

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long se- quence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 11106–11115. A Appendix Received 06 February 2026; accepted 04 April 2026 Thermal-GEMs Generalized...