arxiv: 2604.11529 · v2 · submitted 2026-04-13 · 💻 cs.LG

Recognition: unknown

TempusBench: An Evaluation Framework for Time-Series Forecasting

Denizalp Goktas , Gerardo Ria\~no-Brice\~no , Alif Abdullah , Aryan Nair , Chenkai Shen , Beatriz de Lucio , Alexandra Magnusson , Farhan Mashrur

show 5 more authors

Ahmed Abdulla Shawrna Sen Mahitha Thippireddy Gregory Schwartz Amy Greenwald

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords time series forecastingfoundation modelsevaluation benchmarknon-stationarityseasonalityhyperparameter tuningvisualization

0 comments

The pith

TempusBench supplies fresh datasets, novel tasks, standardized tuning, and visualization to evaluate time-series foundation models more reliably.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing benchmarks for time-series forecasting models suffer from data overlap with pretraining sets, limited task coverage that ignores properties like non-stationarity and seasonality, inconsistent hyperparameter tuning across models, and absent visualization tools. TempusBench counters these by releasing new datasets confirmed absent from prior corpora, introducing tasks focused on those statistical features, enforcing uniform tuning protocols, and adding a TensorBoard interface for comparisons. A sympathetic reader would care because reliable evaluation is essential for advancing time-series foundation models beyond current claims of progress. The framework includes a live leaderboard to standardize community assessments.

Core claim

TempusBench is an open-source evaluation framework for time-series foundation models consisting of new datasets not included in existing TSFM pretraining corpora, a set of novel benchmark tasks that extend beyond domain and horizon to include statistical properties such as non-stationarity and seasonality, a model evaluation pipeline enforcing standardized hyperparameter tuning for all models including domain-specific ones, and a tensorboard-based visualization interface.

What carries the argument

TempusBench, the framework whose four parts—new datasets, novel tasks, standardized tuning protocol, and visualization interface—directly target the four identified evaluation issues.

If this is right

Models evaluated under TempusBench avoid unfair advantages from pretraining data leakage.
Comparisons now account for performance under non-stationary and seasonal conditions.
Domain-specific models receive the same hyperparameter optimization treatment as foundation models.
Researchers gain visual tools to better interpret why one model outperforms another.
Community benchmarks become reproducible through the open-source code and live leaderboard.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption of TempusBench could accelerate development of more robust time-series models by highlighting weaknesses in current evaluation practices.
Similar frameworks might emerge for other domains where foundation model evaluation lacks standardization.
Future work could expand the novel tasks to include additional statistical properties like long-range dependencies.

Load-bearing premise

The new datasets are genuinely absent from existing TSFM pretraining corpora and the novel tasks meaningfully capture overlooked statistical properties such as non-stationarity and seasonality better than prior benchmarks.

What would settle it

Finding that any of the new datasets appear in a TSFM pretraining corpus, or observing that model rankings on TempusBench tasks remain unchanged from traditional benchmarks without gains on non-stationarity metrics, would undermine the framework's value.

read the original abstract

Foundation models have transformed natural language processing and computer vision, and a rapidly growing literature on time-series foundation models (TSFMs) seeks to replicate this success in forecasting. While recent open-source models demonstrate the promise of TSFMs, the field lacks a comprehensive and community-accepted model evaluation framework. We see at least four major issues impeding progress on the development of such a framework. First, existing evaluation frameworks comprise benchmark forecasting tasks derived from often outdated datasets (e.g., M3), many of which lack clear metadata and overlap with the corpora used to pre-train TSFMs. Second, these frameworks evaluate models along a narrowly defined set of benchmark forecasting tasks, such as forecast horizon length or domain, but overlook core statistical properties such as non-stationarity and seasonality. Third, domain-specific models (e.g., XGBoost) are often compared unfairly, as existing frameworks do not enforce a systematic and consistent hyperparameter tuning convention for all models. Fourth, visualization tools for interpreting comparative performance are lacking. To address these issues, we introduce TempusBench, an open-source evaluation framework for TSFMs. TempusBench consists of 1) new datasets which are not included in existing TSFM pretraining corpora, 2) a set of novel benchmark tasks that go beyond existing ones, 3) a model evaluation pipeline with a standardized hyperparameter tuning protocol, and 4) a tensorboard-based visualization interface. We provide access to our code on GitHub: https://github.com/Smlcrm/TempusBench and maintain a live leaderboard at https://benchmark.smlcrm.com/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TempusBench bundles sensible fixes for TSFM evaluation but asserts dataset novelty and task improvements without any verification or results to back them.

read the letter

The key takeaway is that TempusBench tries to clean up evaluation for time-series foundation models by introducing new datasets, tasks focused on statistical properties, standardized tuning, and visualization, but it does not provide any checks or results to confirm these actually solve the stated problems. The authors lay out four clear issues: datasets that overlap with pretraining data, tasks that ignore things like non-stationarity and seasonality, unfair comparisons because of uneven tuning, and missing visualization tools. They respond with a framework that includes datasets they say are new, tasks targeting those properties, a uniform hyperparameter tuning pipeline, and a TensorBoard interface. The GitHub link and live leaderboard are concrete steps that let others use and extend the work. The standardized tuning is a solid practical contribution. Forcing the same protocol on foundation models and things like XGBoost removes one source of bias in comparisons. The soft spots are around the unverified claims. The paper asserts the datasets are absent from existing TSFM pretraining corpora but gives no list of the datasets or method for confirming no overlap. It also does not include any experiments showing that the new tasks better isolate statistical properties or that models perform differently under this setup. Without that, it is difficult to judge whether the framework improves on prior ones or simply adds another option. This work is aimed at researchers developing or evaluating time-series foundation models. Anyone frustrated with current benchmark practices in the area would find the structure and code useful, even if they need to add their own validation. I recommend sending it for peer review. The problems it targets matter, the proposed components are reasonable, and the open code allows for proper checking. Referees can push for the missing verification and perhaps some initial results to make the claims stronger.

Referee Report

3 major / 2 minor

Summary. The paper identifies four issues in existing TSFM evaluation frameworks: benchmark tasks derived from outdated datasets (e.g., M3) that overlap with TSFM pretraining corpora, narrowly defined tasks that overlook statistical properties such as non-stationarity and seasonality, unfair comparisons with domain-specific models due to inconsistent hyperparameter tuning, and lack of visualization tools. It proposes TempusBench as an open-source framework consisting of new datasets not included in existing TSFM pretraining corpora, novel benchmark tasks, a standardized hyperparameter tuning protocol, and a TensorBoard-based visualization interface, with code released on GitHub and a live leaderboard.

Significance. If the new datasets are verifiably absent from pretraining corpora and the novel tasks demonstrably better isolate properties like non-stationarity and seasonality, TempusBench could establish a more reliable and standardized evaluation protocol for time-series foundation models. The explicit code release and public leaderboard are concrete strengths that support reproducibility and community adoption.

major comments (3)

[Abstract] Abstract: The core claim that the new datasets 'are not included in existing TSFM pretraining corpora' is asserted without naming the datasets, describing any overlap verification procedure (e.g., against TimesFM, Chronos, or Lag-Llama corpora), or providing empirical checks; this verification is load-bearing for solving the data-leakage issue identified as the first major problem.
[Abstract] Abstract: The assertion that the 'novel benchmark tasks' better capture overlooked statistical properties such as non-stationarity and seasonality than prior benchmarks (e.g., M3) is made without any comparative statistics, ablation results, or task definitions; this evidence is required to substantiate the second identified issue.
[Abstract] Abstract / framework description: The standardized hyperparameter tuning protocol is presented at a high level with no specifics on the tuning procedure, search space, or validation that it produces fair comparisons across TSFMs and baselines such as XGBoost; this detail is necessary to address the third issue.

minor comments (2)

[Abstract] The GitHub link and leaderboard URL are provided, supporting reproducibility; consider adding a table in the main text that lists the new datasets with basic metadata (size, domain, statistical properties) to improve clarity.
[Abstract] The abstract references specific prior models (TimesFM, Chronos, Lag-Llama) and datasets (M3) but does not include citations; adding inline references would aid readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments correctly identify areas where the abstract could be strengthened with additional specifics to better support our core claims. We will revise the abstract and, where appropriate, cross-reference the full manuscript to address these points directly.

read point-by-point responses

Referee: [Abstract] Abstract: The core claim that the new datasets 'are not included in existing TSFM pretraining corpora' is asserted without naming the datasets, describing any overlap verification procedure (e.g., against TimesFM, Chronos, or Lag-Llama corpora), or providing empirical checks; this verification is load-bearing for solving the data-leakage issue identified as the first major problem.

Authors: We agree that the abstract would benefit from greater specificity on this point. In the revised version we will name the datasets (e.g., the newly collected retail, energy, and climate series) and briefly describe the overlap verification procedure, which consists of exact string matching and temporal-range checks against the publicly released pretraining corpora of TimesFM, Chronos, and Lag-Llama. The full verification protocol, code, and empirical overlap statistics appear in Section 3.1 and Appendix A. We will add a concise summary sentence to the abstract. revision: yes
Referee: [Abstract] Abstract: The assertion that the 'novel benchmark tasks' better capture overlooked statistical properties such as non-stationarity and seasonality than prior benchmarks (e.g., M3) is made without any comparative statistics, ablation results, or task definitions; this evidence is required to substantiate the second identified issue.

Authors: We acknowledge the abstract is currently high-level on this claim. The revised abstract will include a short definition of the new tasks (e.g., controlled non-stationarity injection and seasonality decomposition benchmarks) together with a one-sentence summary of the comparative results. Detailed task definitions, ablation studies, and statistical comparisons versus M3 appear in Section 4.2 and Appendix B. We will incorporate a brief reference to these findings in the abstract. revision: yes
Referee: [Abstract] Abstract / framework description: The standardized hyperparameter tuning protocol is presented at a high level with no specifics on the tuning procedure, search space, or validation that it produces fair comparisons across TSFMs and baselines such as XGBoost; this detail is necessary to address the third issue.

Authors: We agree that the abstract would be improved by including concrete details on the protocol. In the revision we will add a sentence describing the procedure (time-series cross-validation with a fixed budget of 50 trials per model), the search spaces (e.g., learning-rate and depth grids for tree-based models, context-length and patch-size ranges for TSFMs), and the fairness validation (identical compute budget and validation scheme applied to all models including XGBoost). Full specifications and fairness checks are provided in Section 3.3 and Section 5. We will update the abstract accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: framework proposal with no derivation chain

full rationale

The paper introduces TempusBench as an open-source evaluation framework with four components: new datasets asserted absent from TSFM pretraining corpora, novel benchmark tasks, standardized hyperparameter tuning, and a visualization interface. No equations, fitted parameters, predictions, or first-principles derivations appear in the abstract or described structure. The central claims are empirical assertions about dataset novelty and task coverage, supported by external GitHub release rather than reducing to self-definition, self-citation, or renaming. This is a standard framework proposal without load-bearing mathematical steps that could exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework's utility rests on the unverified claim that new datasets avoid pretraining overlap and that the added tasks and tuning protocol deliver fairer, more informative evaluation than prior work.

axioms (2)

domain assumption Existing evaluation frameworks use outdated datasets that overlap with TSFM pretraining corpora and evaluate only along narrow dimensions such as horizon length or domain.
Stated directly in the abstract as the motivation for the new framework.
ad hoc to paper The datasets introduced in TempusBench are not included in existing TSFM pretraining corpora.
Claimed in the abstract but not demonstrated or evidenced within the provided text.

pith-pipeline@v0.9.0 · 5645 in / 1379 out tokens · 45814 ms · 2026-05-10T15:17:31.495005+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

107 extracted references · 22 canonical work pages · 4 internal anchors

[1]

Moirai: Foundation models for time series forecasting

Sangwoo Woo et al. Moirai: Foundation models for time series forecasting. InInternational Conference on Learning Representations (ICLR), 2024. 2, 37

2024
[2]

Moirai-MoE: Empowering time series foundation models with sparse mixture of experts, 2024

Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Moirai-MoE: Empowering time series foundation models with sparse mixture of experts.arXiv preprint arXiv:2410.10469,

work page arXiv
[3]

Lag-llama: Towards foundation models for probabilistic time series forecasting.arXiv preprint arXiv:2310.08278,

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, et al. Lag-Llama: Towards foundation models for probabilistic time series forecasting.arXiv preprint arXiv:2310.08278, 2023. 25

work page arXiv 2023
[4]

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815,

work page internal anchor Pith review arXiv
[5]

Timesfm: Time series foundation models at scale, 2023

Abhimanyu Das et al. Timesfm: Time series foundation models at scale, 2023. 37

2023
[6]

Moment: A family of open time-series foundation models

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models.arXiv preprint arXiv:2402.03885,

work page arXiv
[7]

Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning

Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian Böck, Günter Klambauer, and Sepp Hochreiter. Tirex: Zero-shot forecasting across long and short horizons with enhanced in- context learning.arXiv preprint arXiv:2505.23719, 2025

work page arXiv 2025
[8]

True zero-shot inference of dynamical systems preserving long-term statistics.arXiv preprint arXiv:2505.13192, 2025

Christoph Jürgen Hemmer and Daniel Durstewitz. True zero-shot inference of dynamical systems preserving long-term statistics.arXiv preprint arXiv:2505.13192, 2025

work page arXiv 2025
[9]

Chronos-2: From Univariate to Universal Forecasting

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, et al. Chronos-2: From univariate to universal forecasting.arXiv preprint arXiv:2510.15821, 2025

work page internal anchor Pith review arXiv 2025
[10]

B., M ¨uller, S., Salinas, D., and Hutter, F

Shi Bin Hoo, Samuel Müller, David Salinas, and Frank Hutter. From tables to time: How TabPFN-v2 outperforms specialized time series forecasting models.arXiv preprint arXiv:2501.02945, 2025. 24

work page arXiv 2025
[11]

Kairos: Toward adaptive and parameter-efficient time series foundation models.arXiv preprint arXiv:2509.25826, 2025

Kun Feng, Shaocheng Lan, Yuchen Fang, Wenchao He, Lintao Ma, Xingyu Lu, and Kan Ren. Kairos: Toward adaptive and parameter-efficient time series foundation models.arXiv preprint arXiv:2509.25826, 2025

work page internal anchor Pith review arXiv 2025
[12]

Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Sumanta Mukherjee, Nam H Nguyen, Wesley M Gifford, Chandra Reddy, and Jayant Kalagnanam. Tiny time mixers (ttms): Fast pre- trained models for enhanced zero/few-shot forecasting of multivariate time series.Advances in Neural Information Processing Systems, 37:74147–74181, 2024. 24

2024
[13]

This time is different: An observabil- ity perspective on time series foundation models.arXiv preprint arXiv:2505.14766,

Ben Cohen, Emaad Khwaja, Youssef Doubli, Salahidine Lemaachi, Chris Lettieri, Charles Masson, Hugo Miccinilli, Elise Ramé, Qiqi Ren, Afshin Rostamizadeh, et al. This time is different: An observability perspective on time series foundation models.arXiv preprint arXiv:2505.14766, 2025. 25, 40

work page arXiv 2025
[14]

Sundial: A Family of Highly Capable Time Series Foundation Models

Yong Liu, Guo Qin, Zhiyuan Shi, Zhi Chen, Caiyin Yang, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Sundial: A family of highly capable time series foundation models. arXiv preprint arXiv:2502.00816, 2025. 2

work page internal anchor Pith review arXiv 2025
[15]

George E. P. Box and Gwilym M. Jenkins.Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco, 1970. 2

1970
[16]

Vapnik.The Nature of Statistical Learning Theory

Vladimir N. Vapnik.The Nature of Statistical Learning Theory. Springer, New York, 1995. 2 7

1995
[17]

Harris Drucker, Christopher J. C. Burges, Linda Kaufman, Alexander Smola, and Vladimir Vapnik. Support vector regression machines. InAdvances in Neural Information Processing Systems 9, pages 155–161. MIT Press, 1997. 2

1997
[18]

Webb, et al

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, et al. Monash time series forecasting archive. InNeurIPS Datasets and Benchmarks Track, 2021. 2, 3, 37

2021
[19]

Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods.arXiv preprint arXiv:2403.20150, 2024

Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S Jensen, Zhenli Sheng, et al. Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods.arXiv preprint arXiv:2403.20150, 2024. 3, 37

work page arXiv 2024
[20]

Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023. 3, 37

2023
[21]

Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis.IEEE Transactions on Knowledge and Data Engineering, 2024

Zezhi Shao, Fei Wang, Yongjun Xu, Wei Wei, Chengqing Yu, Zhao Zhang, Di Yao, Tao Sun, Guangyin Jin, Xin Cao, et al. Exploring progress in multivariate time series forecasting: Comprehensive benchmarking and heterogeneity analysis.IEEE Transactions on Knowledge and Data Engineering, 2024. 3

2024
[22]

Probts: Benchmarking point and distributional forecasting across diverse prediction horizons.Ad- vances in Neural Information Processing Systems, 37:48045–48082, 2024

Jiawen Zhang, Xumeng Wen, Zhenwei Zhang, Shun Zheng, Jia Li, and Jiang Bian. Probts: Benchmarking point and distributional forecasting across diverse prediction horizons.Ad- vances in Neural Information Processing Systems, 37:48045–48082, 2024. 3

2024
[23]

Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393,

Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393, 2024. 2, 3

work page arXiv 2024
[24]

Foundation models for time series analysis: A tutorial and survey

Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. Foundation models for time series analysis: A tutorial and survey. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 6555–6565, 2024. 2

2024
[25]

Time series forecasting using Holt-Winters exponential smoothing

Prajakta S Kalekar et al. Time series forecasting using Holt-Winters exponential smoothing. Kanwal Rekhi school of information Technology, 4329008(13):1–13, 2004. 2, 27, 38, 39, 40

2004
[26]

The M3-competition: Results, conclusions and implications.International Journal of Forecasting, 16(4):451–476, 2000

Spyros Makridakis and Michele Hibon. The M3-competition: Results, conclusions and implications.International Journal of Forecasting, 16(4):451–476, 2000. 2, 37

2000
[27]

The M4 competition: Results, findings, conclusion and way forward.International Journal of Forecasting, 34(4): 802–808, 2018

Spyros Makridakis, Evangelos Spiliotis, and Vassilis Assimakopoulos. The M4 competition: Results, findings, conclusion and way forward.International Journal of Forecasting, 34(4): 802–808, 2018. 2, 37

2018
[28]

A comparative analysis of artificial neural networks in time series forecasting using ARIMA vs Prophet

Pooja Anand, Mayank Sharma, and Anil Saroliya. A comparative analysis of artificial neural networks in time series forecasting using ARIMA vs Prophet. In2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), pages 527–533. IEEE, 2024. 2, 28, 37, 38, 39, 40

2024
[29]

Gift-eval

Salesforce. Gift-eval. Hugging Face Space, 2024. URL https://huggingface.co/ spaces/Salesforce/GIFT-Eval. Accessed: 2025-08-29. 2

2024
[30]

The rise of foundation time-series forecasting models

u/nkafr. The rise of foundation time-series forecasting models. https: //www.reddit.com/r/datascience/comments/1e865bt/the_rise_ of_foundation_timeseries_forecasting/, 2024. URL https: //www.reddit.com/r/datascience/comments/1e865bt/the_rise_of_ foundation_timeseries_forecasting/. Reddit post on r/datascience. 2

2024
[31]

Another look at measures of forecast accuracy.Interna- tional journal of forecasting, 22(4):679–688, 2006

Rob J Hyndman and Anne B Koehler. Another look at measures of forecast accuracy.Interna- tional journal of forecasting, 22(4):679–688, 2006. 2

2006
[32]

Hyndman and George Athanasopoulos.Forecasting: Principles and Practice

Rob J. Hyndman and George Athanasopoulos.Forecasting: Principles and Practice. OTexts, Melbourne, Australia, 3rd edition, 2021. URLhttps://otexts.com/fpp3/. 2 8

2021
[33]

Unified training of universal time series forecasting transformers.arXiv preprint arXiv:2402.02592, 2024

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers.arXiv preprint arXiv:2402.02592, 2024. 22

work page arXiv 2024
[34]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. InForty-first International Conference on Machine Learning,
[35]

Forecasting economics and financial time series: ARIMA vs

Sima Siami-Namini and Akbar Siami Namin. Forecasting economics and financial time series: ARIMA vs. LSTM.arXiv preprint arXiv:1803.06386, 2018. 26, 28, 37, 38, 39, 40

work page arXiv 2018
[36]

Fore- casting intermittent demand in manufacturing: a comparative evaluation of Croston’s method

Thomas R Willemain, Charles N Smart, Joseph H Shockor, and Philip A DeSautels. Fore- casting intermittent demand in manufacturing: a comparative evaluation of Croston’s method. International Journal of forecasting, 10(4):529–538, 1994. 27

1994
[37]

Random forests-based extreme learning machine ensemble for multi-regime time series prediction.Expert Systems with Applications, 83:164–176, 2017

Lin Lin, Fang Wang, Xiaolong Xie, and Shisheng Zhong. Random forests-based extreme learning machine ensemble for multi-regime time series prediction.Expert Systems with Applications, 83:164–176, 2017. 29

2017
[38]

A novel ensemble model for load forecasting: Integrating random forest, XGBoost, and seasonal naive methods

Senyao Wang and Jin Ma. A novel ensemble model for load forecasting: Integrating random forest, XGBoost, and seasonal naive methods. In2023 2nd Asian Conference on Frontiers of Power and Energy (ACFPE), pages 114–118. IEEE, 2023. 29, 31, 38, 39, 40

2023
[39]

Support vector regression

Fan Zhang and Lauren J O’Donnell. Support vector regression. InMachine learning, pages 123–140. Elsevier, 2020. 30, 38

2020
[40]

Forecasting multivariate time series with the Theta method.Journal of Forecasting, 34(3):220–229, 2015

Dimitrios D Thomakos and Konstantinos Nikolopoulos. Forecasting multivariate time series with the Theta method.Journal of Forecasting, 34(3):220–229, 2015. 30, 37, 38, 39

2015
[41]

Modeling long- and short- term temporal patterns with deep neural networks

Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short- term temporal patterns with deep neural networks. InThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18, pages 95–104, New York, NY , USA, 2018. Association for Computing Machinery. ISBN 9781450356572. doi: 10.1145/320...

work page doi:10.1145/3209978.3210006 2018
[42]

Software Development Job Postings on Indeed in the United States [IHLIDXUSTPSOFTDEVE]

Indeed. Software Development Job Postings on Indeed in the United States [IHLIDXUSTPSOFTDEVE]. https://fred.stlouisfed.org/series/ IHLIDXUSTPSOFTDEVE, 2025. Retrieved August 29, 2025. 32, 33

2025
[43]

NIFTY-50 minute data

Debashis74017. NIFTY-50 minute data. https://www.kaggle.com/datasets/ debashis74017/nifty-50-minute-data, 2024. Kaggle dataset. 32

2024
[44]

Economy of the United States

Rajkumar Pandey. Economy of the United States. https://www.kaggle.com/ datasets/rajkumarpandey02/economy-of-the-united-states , 2024. Kaggle dataset. 32

2024
[45]

Daily Gold Price (2015-2021) Time Series

Nisarg Chodavadiya. Daily Gold Price (2015-2021) Time Series. https://www.kaggle.com/datasets/nisargchodavadiya/ daily-gold-price-20152021-time-series , 2025. Accessed on August 29, 2025. 32, 33

2015
[46]

Coinbase Litecoin [CBLTCUSD]

Coinbase. Coinbase Litecoin [CBLTCUSD]. https://fred.stlouisfed.org/ series/CBLTCUSD, 2025. Retrieved August 29, 2025. 32

2025
[47]

Stock time series (2005/01/01 to 2017/12/31)

szrlee. Stock time series (2005/01/01 to 2017/12/31). https://www.kaggle.com/ datasets/szrlee/stock-time-series-20050101-to-20171231 , 2024. Kaggle dataset. 32, 33

2005
[48]

2001-2022 Hourly Dataset of Pollution in Madrid

IgnacioQG. 2001-2022 Hourly Dataset of Pollution in Madrid. https://www.kaggle.com/datasets/ignacioqg/ 20012022-hourly-dataset-of-pollution-in-madrid , 2022. Accessed on August 29, 2025. 32, 33 9

2001
[49]

Southern California energy consumption

Dataset Engineer. Southern California energy consumption. https://www.kaggle.com/datasets/datasetengineer/ southern-california-energy-consumption, 2024. Kaggle dataset. 32

2024
[50]

LT 1-Minute Historical Stock Data (2003- 2024)

DeltaTrup. LT 1-Minute Historical Stock Data (2003- 2024). https://www.kaggle.com/datasets/deltatrup/ lt-1-minute-historical-stock-data-2003-2024 , May 2024. Accessed on August 29, 2025. 32

2003
[51]

Weather data

Rohit Grewal. Weather data. https://www.kaggle.com/datasets/ rohitgrewal/weather-data, 2024. Kaggle dataset. 32

2024
[52]

Airline Baggage Complaints – Time Series Dataset

Gabriel Santello. Airline Baggage Complaints – Time Series Dataset. https://www.kaggle.com/datasets/gabrielsantello/ airline-baggage-complaints-time-series-dataset , 2023. Accessed on August 29, 2025. 32, 33

2023
[53]

Census Bureau

U.S. Census Bureau. Manufacturers: Inventories to Sales Ratio [MNFCTRIRSA]. https: //fred.stlouisfed.org/series/MNFCTRIRSA, 2025. Retrieved August 29, 2025. 32

2025
[54]

Monthly currency trends

Rohan Purohit. Monthly currency trends. https://www.kaggle.com/datasets/ rohanpurohit0705/monthly-currency-trends, 2024. Kaggle dataset. 32, 33

2024
[55]

Real Residential Property Prices for Germany [QDER628BIS]

Bank for International Settlements. Real Residential Property Prices for Germany [QDER628BIS]. https://fred.stlouisfed.org/series/QDER628BIS, 2025. Retrieved August 29, 2025. 32

2025
[56]

Utah FORGE: Well 16A(78)-32 Drilling Data

Energy and Geoscience Institute at the University of Utah. Utah FORGE: Well 16A(78)-32 Drilling Data. Accessed via Data.gov, 2025. Accessed on August 29, 2025. 32

2025
[57]

Federal Funds Effective Rate [FF]

Board of Governors of the Federal Reserve System (US). Federal Funds Effective Rate [FF]. https://fred.stlouisfed.org/series/FF, 2025. Retrieved August 29, 2025. 32

2025
[58]

Bureau of Economic Analysis

U.S. Bureau of Economic Analysis. Personal Consumption Expenditures: Chain-type Price Index [DPCERG3A086NBEA]. https://fred.stlouisfed.org/series/ DPCERG3A086NBEA, 2025. Retrieved August 29, 2025. 32

2025
[59]

Daily Climate Time Series Data

SumanthVrao. Daily Climate Time Series Data. https://www.kaggle.com/ datasets/sumanthvrao/daily-climate-time-series-data , 2021. Ac- cessed on August 29, 2025. 32

2021
[60]

SplitSmart: An Open Dataset for Enabling Research in Energy-Efficient Ductless-Split Air Conditioner, 2024

BITS Pilani - Goa. SplitSmart: An Open Dataset for Enabling Research in Energy-Efficient Ductless-Split Air Conditioner, 2024. Accessed on August 29, 2025. 32

2024
[61]

COVID-19 daily counts of cases, hospitalizations, and deaths

New York City Department of Health and Mental Hygiene. COVID-19 daily counts of cases, hospitalizations, and deaths. https://catalog.data.gov/dataset/ covid-19-daily-counts-of-cases-hospitalizations-and-deaths ,
[62]

NYC Open Data via Data.gov catalog. 32
[63]

Bureau of Labor Statistics

U.S. Bureau of Labor Statistics. All Employees, Health Care [CES6562000101]. https: //fred.stlouisfed.org/series/CES6562000101, 2025. Retrieved August 29,

2025
[64]

Construction project performance dataset

ziya07. Construction project performance dataset. https://www.kaggle.com/ datasets/ziya07/construction-project-performance-dataset , 2024. Kaggle dataset. 32

2024
[65]

Soil and Environmental Monitoring

NoeyIsLearning. Soil and Environmental Monitoring. https://www.kaggle.com/ datasets/noeyislearning/soil-and-environmental-monitoring , 2024. Accessed on August 29, 2025. 32 10

2024
[66]

Solar irradiance and weather forecasting dataset

Zoya77. Solar irradiance and weather forecasting dataset. https://www.kaggle.com/datasets/zoya77/ solar-irradiance-and-weather-forecasting-dataset , 2024. Kaggle dataset. 32

2024
[67]

Walmart sales

Mikhail1681. Walmart sales. https://www.kaggle.com/datasets/ mikhail1681/walmart-sales, 2024. Kaggle dataset. 33

2024
[68]

Riccardo Taormina et al. The Battle of the Attack Detection Algorithms: Disclosing Cyber Attacks on Water Distribution Networks.Journal of Water Resources Planning and Man- agement, 144(8):04018048, aug 2018. doi: 10.1061/(ASCE)WR.1943-5452.0000969. URL https://www.batadal.net/data.html. 33

work page doi:10.1061/(asce)wr.1943-5452.0000969 2018
[69]

Cybertec IIoT malware dataset (CIMD 2024)

Dataset Engineer. Cybertec IIoT malware dataset (CIMD 2024). https://www.kaggle.com/datasets/datasetengineer/ cybertec-iiot-malware-dataset-cimd-2024, 2024. Kaggle dataset. 33

2024
[70]

Initial Insights into Telework- ing’s Effect on Air Quality in Madrid City.Environments, 11(9):204, 2024

Jorge Bañuelos-Gimeno, Natalia Sobrino, and Rosa Arce-Ruiz. Initial Insights into Telework- ing’s Effect on Air Quality in Madrid City.Environments, 11(9):204, 2024. doi: 10.3390/ environments11090204. URL https://www.mdpi.com/2076-3298/11/9/204. 33

2024
[71]

Smart mobility traffic dataset

ziya07. Smart mobility traffic dataset. https://www.kaggle.com/datasets/ ziya07/smart-mobility-traffic-dataset, 2024. Kaggle dataset. 33

2024
[72]

Web Traffic Time Series Dataset

RaminHuseyn. Web Traffic Time Series Dataset. https://www.kaggle.com/ datasets/raminhuseyn/web-traffic-time-series-dataset , 2024. Ac- cessed on August 29, 2025. 33

2024
[73]

Hungarian Chickenpox Cases

UCI Machine Learning Repository. Hungarian Chickenpox Cases. https://doi.org/ 10.24432/C5103B, 2021. 33

work page doi:10.24432/c5103b 2021
[74]

MIMIC-III Clinical Database Demo (version 1.4)

Alistair Johnson et al. MIMIC-III Clinical Database Demo (version 1.4). https://doi. org/10.13026/C2HM2Q, 2019. RRID:SCR_007345. 33

work page doi:10.13026/c2hm2q 2019
[75]

Absenteeism at work

Andrea Martiniano and Ricardo Ferreira. Absenteeism at work. https://doi.org/10. 24432/C5X882, 2012. 33

2012
[76]

Online Retail.https://doi.org/10.24432/C5BW33, 2015

Daqing Chen. Online Retail.https://doi.org/10.24432/C5BW33, 2015. 33

work page doi:10.24432/c5bw33 2015
[77]

Forest Fires

Paulo Cortez and Aníbal Morais. Forest Fires. https://doi.org/10.24432/C5D88D,

work page doi:10.24432/c5d88d
[78]

Room Occupancy Estimation

Adarsh Pal Singh and Sachin Chaudhari. Room Occupancy Estimation. https://doi. org/10.24432/C5P605, 2018. 33

work page doi:10.24432/c5p605 2018
[79]

George E. P. Box, Gwilym M. Jenkins, Gregory C. Reinsel, and Greta M. Ljung.Time Series Analysis: Forecasting and Control. Wiley, 5 edition, 2015. 37

2015
[80]

Hyndman, Anne B

Rob J. Hyndman, Anne B. Koehler, J. Keith Ord, and Ralph D. Snyder.Forecasting with Exponential Smoothing: The State Space Approach. Springer, 2008. 37

2008

Showing first 80 references.