pith. sign in

arxiv: 2606.19560 · v1 · pith:PMOOIUUNnew · submitted 2026-06-17 · 💻 cs.LG

Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting

Pith reviewed 2026-06-26 20:54 UTC · model grok-4.3

classification 💻 cs.LG
keywords time series forecastingepidemic modelinginfluenza predictionfoundation modelsmixture of expertspretrained modelshospitalization dataspatial generalization
0
0 comments X

The pith

A mixture-of-experts model fusing multiple pretrained forecasters delivers the strongest performance on influenza epidemic time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates classical neural networks, numerical transformers, pretrained time series foundation models, and LLM-based methods on regional influenza-like illness and hospitalization data for one- to four-week-ahead forecasts under temporal and spatial generalization. It establishes that the mixture-of-experts fusion of several pretrained forecasters yields the best overall accuracy, showing that different pretraining sources supply complementary signals about epidemic spread. Pretraining yields its clearest benefits at longer horizons when the source domain aligns mechanistically with influenza dynamics, while LLM approaches lag behind numerical forecasters. Hospitalization data adds value as an auxiliary input or pretraining source in selected cases. These comparisons supply concrete guidance on architecture choice and data use for epidemic preparedness.

Core claim

Across influenza forecasting tasks, a mixture-of-experts model that fuses multiple pretrained forecasters achieves the strongest overall performance, indicating that heterogeneous pretrained representations provide complementary predictive information. Numerical transformer-based models produce reliable forecasts, while pretraining provides the largest gains at longer horizons particularly when the pretraining domain is mechanistically aligned with influenza dynamics. LLM-based time series methods underperform relative to numerical forecasters. Hospitalization information as both an auxiliary covariate and a pretraining source clarifies when additional surveillance streams enhance the robust

What carries the argument

mixture-of-experts model that fuses multiple pretrained forecasters to combine complementary representations from heterogeneous time series pretraining

If this is right

  • Heterogeneous pretrained representations supply complementary predictive information that improves epidemic time series forecasts.
  • Pretraining gains are largest at longer horizons when the source domain aligns mechanistically with influenza dynamics.
  • Numerical transformer models remain reliable while LLM-based methods underperform on this class of structured count data.
  • Hospitalization signals improve robustness when used as auxiliary covariates or pretraining sources in selected multi-horizon settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Fusion strategies may extend to forecasting other seasonal respiratory diseases whose dynamics share mechanistic features with influenza.
  • Public-health model pipelines could shift toward ensembles of domain-aligned foundation models rather than single architectures.
  • The observed complementarity implies that future pretraining corpora should deliberately sample from multiple epidemic mechanisms to maximize transfer.

Load-bearing premise

The chosen influenza-like illness surveillance and hospitalization time series under the stated temporal and spatial generalization settings are representative enough to support general conclusions about model architectures for epidemic forecasting.

What would settle it

A single model or non-mixture architecture achieving consistently lower error than the mixture-of-experts across all tasks, horizons, and both data types on the same surveillance series would falsify the superiority of the fused approach.

Figures

Figures reproduced from arXiv: 2606.19560 by Alireza Jafari, Aniruddha Adiga, Geoffrey C. Fox, Judy Fox, Madhav Marathe.

Figure 2
Figure 2. Figure 2: Temporal evaluation on ILI over 1–4-week-ahead forecasting horizons. Normalized sum over regions, comparing model predictions with the observed values over the test data. despite claims of strong ILI performance in its paper [12]. In part, this seems tied to mismatch in horizon design: TimeLLM emphasizes very long or non-standard horizons that are less aligned with CDC’s 1–4-week operational focus; when ev… view at source ↗
read the original abstract

Seasonal influenza infects millions of people and causes substantial morbidity and mortality in the United States each year, making accurate short-term forecasting a core public-health need. Reliable forecasts of epidemic time series can inform vaccination timing, hospital staffing, and resource allocation, yet the comparative behavior of modern forecasting architectures on infectious-disease surveillance data remains insufficiently characterized. We address this gap through a systematic evaluation of regional influenza forecasting using influenza-like illness surveillance and influenza-associated hospitalization time series under both temporal and spatial generalization settings for 1-4-week-ahead prediction. We compare classical neural network architectures, numerical transformer-based models, pretrained time series foundation models, and LLM-based forecasting approaches. Across tasks, we demonstrate that a mixture-of-experts model that fuses multiple pretrained forecasters achieves the strongest overall performance, indicating that heterogeneous pretrained representations provide complementary predictive information. Our results further show that numerical transformer-based models produce reliable forecasts, while pretraining provides the largest gains at longer horizons, particularly when the pretraining domain is mechanistically aligned with influenza dynamics. In contrast, LLM-based time series methods underperform relative to numerical forecasters in this setting. Finally, we examine hospitalization information as both an auxiliary covariate and a pretraining source. Hospitalization signals provide complementary improvements in selected settings and clarify when additional surveillance streams enhance the robustness of multi-horizon forecasting. These findings provide actionable guidance on model selection, pretraining strategy, and auxiliary-signal use for influenza preparedness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript evaluates classical neural networks, numerical transformers, pretrained time series foundation models, and LLM-based forecasters on U.S. regional influenza-like illness and hospitalization time series for 1-4-week-ahead prediction under temporal and spatial generalization. It reports that a mixture-of-experts model fusing multiple pretrained forecasters attains the strongest overall performance and interprets this as evidence that heterogeneous pretrained representations supply complementary predictive information. Additional claims concern the benefits of pretraining (especially domain-aligned) at longer horizons, the relative weakness of LLM-based methods, and the value of hospitalization signals as covariates or pretraining sources.

Significance. If the performance ordering and the complementarity interpretation are substantiated by appropriate controls, the work would supply actionable model-selection guidance for epidemic forecasting and clarify when pretraining and auxiliary streams improve multi-horizon robustness. The absence of such controls currently limits the strength of the headline claim.

major comments (1)
  1. [Abstract] Abstract: the claim that the MoE fusing multiple pretrained forecasters 'indicates that heterogeneous pretrained representations provide complementary predictive information' is not isolated by the reported experiments. No ablations are described that would distinguish heterogeneity from (a) the MoE routing mechanism itself, (b) ensemble size, or (c) the training procedure; controls such as an MoE built from repeated copies of a single pretrained model or a non-MoE ensemble (simple averaging or stacking) of the same forecasters are required to support the interpretation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the MoE fusing multiple pretrained forecasters 'indicates that heterogeneous pretrained representations provide complementary predictive information' is not isolated by the reported experiments. No ablations are described that would distinguish heterogeneity from (a) the MoE routing mechanism itself, (b) ensemble size, or (c) the training procedure; controls such as an MoE built from repeated copies of a single pretrained model or a non-MoE ensemble (simple averaging or stacking) of the same forecasters are required to support the interpretation.

    Authors: We agree that the reported experiments do not include the specific ablations needed to isolate the contribution of heterogeneous pretrained representations from the MoE routing mechanism, ensemble size, or training procedure. While the MoE outperforms the individual pretrained models and other forecasters, this does not fully distinguish the sources of improvement. We will revise the abstract to qualify the interpretive claim and will add the suggested controls (MoE with repeated copies of one model and non-MoE ensembles such as averaging or stacking) in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model comparisons rest on external benchmarks

full rationale

The paper conducts a systematic empirical evaluation of forecasting architectures on influenza-like illness and hospitalization time series under temporal and spatial generalization. The headline result (MoE fusing pretrained forecasters) is presented as an observed performance ordering across tasks, with no equations, fitted parameters renamed as predictions, or derivations that reduce to inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the central claims; all conclusions are tied to direct comparisons against held-out data. This is the standard non-circular case for an evaluation study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no equations, derivations, or methodological sections from which to extract free parameters, axioms, or invented entities; ledger is therefore empty.

pith-pipeline@v0.9.1-grok · 5801 in / 1155 out tokens · 27414 ms · 2026-06-26T20:54:17.666947+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 29 canonical work pages

  1. [1]

    Estimating influenza disease burden from population-based surveillance data in the United States,

    C. Reed, S. S. Chaves, P. Daily Kirley, R. Emerson, D. Aragon, E. B. Hancock, L. Butler, J. Baumbach, G. Hollick, N. M. Bennettet al., “Estimating influenza disease burden from population-based surveillance data in the United States,”PLOS ONE, vol. 10, no. 3, p. e0118369,

  2. [2]

    Available: https://doi.org/10.1371/journal.pone.0118369

    [Online]. Available: https://doi.org/10.1371/journal.pone.0118369

  3. [3]

    A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States,

    N. G. Reich, L. C. Brooks, S. J. Fox, S. Kandula, C. J. McGowan, E. Moore, D. Osthus, E. L. Ray, A. Tushar, T. K. Yamanaet al., “A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States,”Proceedings of the National Academy of Sciences, vol. 116, no. 8, pp. 3146–3154, 2019. [Online]. Available: https://doi.org...

  4. [4]

    The united states covid-19 forecast hub dataset,

    E. Y . Cramer, Y . Huang, Y . Wang, E. L. Ray, M. Cornell, J. Bracher, A. Brennen, A. J. Castro Rivadeneira, A. Gerding, K. House, D. Jayawardena, A. H. Kanji, A. Khandelwal, K. Le, J. Niemi, A. Stark, A. Shah, N. Wattanachit, M. W. Zorn, N. G. Reich, and US COVID-19 Forecast Hub Consortium, “The united states covid-19 forecast hub dataset,”Scientific Dat...

  5. [5]

    Deep learning foundation and pattern models: Challenges in hydrological time series,

    J. He, Y .-J. Chen, A. Jafari, A. Idamekorala, and G. Fox, “Deep learning foundation and pattern models: Challenges in hydrological time series,”The International Journal of High Performance Computing Applications, vol. 40, no. 1, pp. 22–41, 2026. [Online]. Available: https://doi.org/10.1177/10943420251380008

  6. [6]

    Chronos: Learning the language of time series,

    A. F. Ansari, L. Stella, C. Turkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Rangapuram, S. P. Arango, S. Kapooret al., “Chronos: Learning the language of time series,” 2024. [Online]. Available: https://arxiv.org/abs/2403.07815

  7. [7]

    GCNET: Graph-based prediction of stock price movement using graph convolutional network,

    A. Jafari and S. Haratizadeh, “GCNET: Graph-based prediction of stock price movement using graph convolutional network,”Engineering Applications of Artificial Intelligence, vol. 116, p. 105452, 2022. [Online]. Available: https://doi.org/10.1016/j.engappai.2022.105452 12

  8. [8]

    Time series foundation models and deep learning architectures for earthquake temporal and spatial nowcasting,

    A. Jafari, G. Fox, J. B. Rundle, A. Donnellan, and L. G. Ludwig, “Time series foundation models and deep learning architectures for earthquake temporal and spatial nowcasting,”GeoHazards, vol. 5, no. 4, pp. 1247–1274, 2024. [Online]. Available: https: //doi.org/10.3390/geohazards5040059

  9. [9]

    NETpred: Network-based modeling and prediction of multiple connected market indices,

    A. Jafari and S. Haratizadeh, “NETpred: Network-based modeling and prediction of multiple connected market indices,” 2022. [Online]. Available: https://arxiv.org/abs/2212.05916

  10. [10]

    COVID-Transformer: Interpretable COVID-19 detection using vision transformer for healthcare,

    D. Shome, T. Kar, S. N. Mohanty, P. Tiwari, K. Muhammad, A. AlTameem, Y . Zhang, and A. K. J. Saudagar, “COVID-Transformer: Interpretable COVID-19 detection using vision transformer for healthcare,”International Journal of Environmental Research and Public Health, vol. 18, no. 21, p. 11086, 2021. [Online]. Available: https://doi.org/10.3390/ijerph182111086

  11. [11]

    Interpreting county-level covid-19 infections using transformer and deep learning time series models,

    M. K. Islam, Y . Liu, A. Erkelens, N. Daniello, A. Marathe, and J. Fox, “Interpreting county-level covid-19 infections using transformer and deep learning time series models,” in2023 IEEE International Conference on Digital Health (ICDH). IEEE, 2023, pp. 266–277. [Online]. Available: https://doi.org/10.1109/ICDH60066.2023.00046

  12. [12]

    TimesNet: Temporal 2D-variation modeling for general time series analysis,

    H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long, “TimesNet: Temporal 2D-variation modeling for general time series analysis,”arXiv preprint arXiv:2210.02186, 2022, preprint version; prefer citing wu2023timesnet when possible. [Online]. Available: https://arxiv.org/abs/2210.02186

  13. [13]

    Time-LLM: Time series forecasting by reprogramming large language models,

    M. Jin, S. Wang, L. Ma, Z. Chu, J. Y . Zhang, X. Shi, P.-Y . Chen, Y . Liang, Y .-F. Li, S. Pan, and Q. Wen, “Time-LLM: Time series forecasting by reprogramming large language models,” inInternational Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=Unb5CVPtae

  14. [14]

    Collaborative efforts to forecast seasonal influenza in the United States, 2015–2016,

    C. J. McGowan, M. Biggerstaff, M. Johansson, K. M. Apfeldorf, M. Ben-Nun, L. Brooks, M. Convertino, M. Erraguntla, D. C. Farrow, J. Freezeet al., “Collaborative efforts to forecast seasonal influenza in the United States, 2015–2016,”Scientific Reports, vol. 9, no. 1, p. 683,

  15. [15]

    Available: https://doi.org/10.1038/s41598-018-36361-9

    [Online]. Available: https://doi.org/10.1038/s41598-018-36361-9

  16. [16]

    Graph neural network for traffic forecasting: A survey,

    W. Jiang and J. Luo, “Graph neural network for traffic forecasting: A survey,”Expert Systems with Applications, vol. 207, p. 117921, 2022. [Online]. Available: https://doi.org/10.1016/j.eswa.2022.117921

  17. [17]

    The predictive skill of convolutional neural networks models for disease forecasting,

    K. Lee, J. Ray, and C. Safta, “The predictive skill of convolutional neural networks models for disease forecasting,”PLOS ONE, vol. 16, no. 7, p. e0254319, 2021. [Online]. Available: https: //doi.org/10.1371/journal.pone.0254319

  18. [18]

    CausalGNN: Causal-based graph neural networks for spatio-temporal epidemic forecasting,

    L. Wang, A. Adiga, J. Chen, A. Sadilek, S. Venkatramanan, and M. Marathe, “CausalGNN: Causal-based graph neural networks for spatio-temporal epidemic forecasting,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 11, pp. 12 191–12 199,

  19. [19]

    Available: https://doi.org/10.1609/aaai.v36i11.21479

    [Online]. Available: https://doi.org/10.1609/aaai.v36i11.21479

  20. [20]

    Enhancing deep traffic forecasting models with dynamic regression,

    V . Z. Zheng, S. Choi, and L. Sun, “Enhancing deep traffic forecasting models with dynamic regression,” 2023. [Online]. Available: https://arxiv.org/abs/2301.06650

  21. [21]

    A comparison of infectious disease forecasting methods across locations, diseases, and time,

    S. Dixon, R. Keshavamurthy, D. H. Farber, A. Stevens, K. T. Pazdernik, and L. E. Charles, “A comparison of infectious disease forecasting methods across locations, diseases, and time,”Pathogens, vol. 11, no. 2, p. 185, 2022. [Online]. Available: https://doi.org/10. 3390/pathogens11020185

  22. [22]

    Applying infectious disease forecasting to public health: A path forward using influenza forecasting examples,

    C. S. Lutz, M. P. Huynh, M. Schroeder, S. Anyatonwu, F. S. Dahlgren, G. Danyluk, D. Fernandez, S. K. Greene, N. Kipshidze, L. Liuet al., “Applying infectious disease forecasting to public health: A path forward using influenza forecasting examples,” BMC Public Health, vol. 19, p. 1659, 2019. [Online]. Available: https://doi.org/10.1186/s12889-019-7966-8

  23. [23]

    SEIR modeling of the COVID-19 and its dynamics,

    S. He, Y . Peng, and K. Sun, “SEIR modeling of the COVID-19 and its dynamics,”Nonlinear Dynamics, vol. 101, pp. 1667–1680, 2020. [Online]. Available: https://doi.org/10.1007/s11071-020-05743-y

  24. [24]

    A simplicial epidemic model for COVID-19 spread analysis,

    Y . Chen, Y . R. Gel, M. V . Marathe, and H. V . Poor, “A simplicial epidemic model for COVID-19 spread analysis,”Proceedings of the National Academy of Sciences, vol. 121, no. 1, p. e2313171120, 2024. [Online]. Available: https://doi.org/10.1073/pnas.2313171120

  25. [25]

    Informing university COVID-19 decisions using simple compartmental models,

    B. Hurt, A. Adiga, M. Marathe, and C. L. Barrett, “Informing university COVID-19 decisions using simple compartmental models,” in2021 Winter Simulation Conference (WSC), 2021, pp. 1–12. [Online]. Available: https://doi.org/10.1109/WSC52266.2021.9715467

  26. [26]

    Rational evaluation of various epidemic models based on the COVID-19 data of China,

    W. Yang, D. Zhang, L. Peng, C. Zhuge, and L. Hong, “Rational evaluation of various epidemic models based on the COVID-19 data of China,”Epidemics, vol. 37, p. 100501, 2021. [Online]. Available: https://doi.org/10.1016/j.epidem.2021.100501

  27. [27]

    An overview of forecast analysis with ARIMA models during the COVID-19 pandemic: Methodology and case study in Brazil,

    R. Ospina, J. A. M. Gondim, V . Leiva, and C. Castro, “An overview of forecast analysis with ARIMA models during the COVID-19 pandemic: Methodology and case study in Brazil,”Mathematics, vol. 11, no. 14, p. 3069, 2023. [Online]. Available: https://doi.org/10.3390/math11143069

  28. [28]

    Prediction of global Omicron pandemic using ARIMA, MLR, and Prophet models,

    D. Zhao, R. Zhang, H. Zhang, and S. He, “Prediction of global Omicron pandemic using ARIMA, MLR, and Prophet models,” Scientific Reports, vol. 12, p. 18138, 2022. [Online]. Available: https://doi.org/10.1038/s41598-022-23154-4

  29. [29]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 7817–7825, https://doi.org/10.1609/aaai

    A. Adiga, G. Kaur, L. Wang, B. Hurt, P. Porebski, S. Venkatramanan, B. Lewis, and M. V . Marathe, “Phase-informed bayesian ensemble models improve performance of covid-19 forecasts,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 13, pp. 15 647–15 653, 2024. [Online]. Available: https://doi.org/10.1609/aaai. v37i13.26855

  30. [30]

    Cola-GNN: Cross-location attention based graph neural networks for long-term ILI prediction,

    S. Deng, S. Wang, H. Rangwala, L. Wang, and Y . Ning, “Cola-GNN: Cross-location attention based graph neural networks for long-term ILI prediction,” inProceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 245–254. [Online]. Available: https://doi.org/10.1145/3340531.3411975

  31. [31]

    EpiGNN: Exploring spatial transmission with graph neural network for regional epidemic forecasting,

    F. Xie, Z. Zhang, L. Li, B. Zhou, and Y . Tan, “EpiGNN: Exploring spatial transmission with graph neural network for regional epidemic forecasting,” inMachine Learning and Knowledge Discovery in Databases: ECML PKDD 2022, ser. Lecture Notes in Computer Science, vol. 13718. Springer, 2023, pp. 469–485. [Online]. Available: https://doi.org/10.1007/978-3-031...

  32. [32]

    Proceedings of the 32nd ACM International Conference on Information and Knowledge Management , pages =

    M. Liu, Y . Liu, and J. Liu, “Epidemiology-aware deep learning for infectious disease dynamics prediction,” inProceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 4084–4088. [Online]. Available: https: //doi.org/10.1145/3583780.3615139

  33. [33]

    RESEAT: Recurrent self-attention network for multi-regional influenza forecasting,

    J. Moon, S. Jung, S. Park, and E. Hwang, “RESEAT: Recurrent self-attention network for multi-regional influenza forecasting,”IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 5, pp. 2585–2596, 2023. [Online]. Available: https://doi.org/10.1109/JBHI. 2023.3247687

  34. [34]

    Self-attention-based deep learning network for regional influenza forecasting,

    S. Jung, J. Moon, S. Park, and E. Hwang, “Self-attention-based deep learning network for regional influenza forecasting,”IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 2, pp. 922–933, 2022. [Online]. Available: https://doi.org/10.1109/JBHI.2021.3093897

  35. [35]

    An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,

    S. Bai, J. Z. Kolter, and V . Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” 2018. [Online]. Available: https://arxiv.org/abs/1803.01271

  36. [36]

    Long-term forecasting with TiDE: Time-series dense encoder,

    A. Das, W. Kong, A. Leach, S. Mathur, R. Sen, and R. Yu, “Long-term forecasting with TiDE: Time-series dense encoder,” Transactions on Machine Learning Research, 2023. [Online]. Available: https://openreview.net/forum?id=pCbC3aQB5W

  37. [37]

    A time series is worth 64 words: Long-term forecasting with transformers,

    Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=Jbdc0vTOcol

  38. [38]

    iTransformer: Inverted transformers are effective for time series forecasting,

    Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “iTransformer: Inverted transformers are effective for time series forecasting,” inInternational Conference on Learning Representations,

  39. [39]

    Available: https://openreview.net/forum?id=JePfAI8fah

    [Online]. Available: https://openreview.net/forum?id=JePfAI8fah

  40. [40]

    Temporal fusion transformers for interpretable multi-horizon time series forecasting,

    B. Lim, S. O. Arik, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,” International Journal of Forecasting, vol. 37, no. 4, pp. 1748–1764,

  41. [41]

    Available: https://doi.org/10.1016/j.ijforecast.2021.03

    [Online]. Available: https://doi.org/10.1016/j.ijforecast.2021.03. 012

  42. [42]

    FluSight: Forecasts of flu hospital admissions,

    Centers for Disease Control and Prevention, “FluSight: Forecasts of flu hospital admissions,” Online, 2023, accessed: 2026-06-

  43. [43]

    Available: https://www.cdc.gov/flu-forecasting/data-vis/ current-week.html

    [Online]. Available: https://www.cdc.gov/flu-forecasting/data-vis/ current-week.html

  44. [44]

    Monash time series forecasting archive,

    R. Godahewa, C. Bergmeir, G. I. Webb, R. J. Hyndman, and P. Montero-Manso, “Monash time series forecasting archive,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. [Online]. Available: https: //openreview.net/forum?id=I01l7rc0jcb

  45. [45]

    doi: https://doi.org/10.1016/j.ijforecast.2019.04.014

    S. Makridakis, E. Spiliotis, and V . Assimakopoulos, “The m4 competition: 100,000 time series and 61 forecasting methods,” International Journal of Forecasting, vol. 36, no. 1, pp. 54–74, 2020. [Online]. Available: https://doi.org/10.1016/j.ijforecast.2019.04.014

  46. [46]

    Application of a normalized Nash– Sutcliffe efficiency to improve the accuracy of the Sobol’ sensitivity analysis of a hydrological model,

    J. Nossent and W. Bauwens, “Application of a normalized Nash– Sutcliffe efficiency to improve the accuracy of the Sobol’ sensitivity analysis of a hydrological model,” inEGU General Assembly Conference Abstracts, vol. 14, 2012, p. 237. [Online]. Available: https://meetingorganizer.copernicus.org/EGU2012/EGU2012-237.pdf

  47. [47]

    Position: Temporal measurement interval determines computational and model complexity 13 in single-cell perturbation analysis,

    A. Jafari, H. Shakeri, and H. Daneshmand, “Position: Temporal measurement interval determines computational and model complexity 13 in single-cell perturbation analysis,” inProceedings of the 43rd International Conference on Machine Learning, 2026, spotlight position paper. [Online]. Available: https://openreview.net/forum?id= lECKpTE1lW

  48. [48]

    NeuralForecast: User-friendly state-of-the-art neural forecasting models,

    K. G. Olivares, C. Challu, F. Garza, M. Mergenthaler Canseco, and A. Dubrawski, “NeuralForecast: User-friendly state-of-the-art neural forecasting models,” PyCon Salt Lake City, Utah, US, 2022. [Online]. Available: https://github.com/Nixtla/neuralforecast

  49. [49]

    Statsmodels: Econometric and statistical modeling with Python,

    S. Seabold and J. Perktold, “Statsmodels: Econometric and statistical modeling with Python,” inProceedings of the 9th Python in Science Conference, Austin, TX, 2010, pp. 92–96. [Online]. Available: https://conference.scipy.org/proceedings/scipy2010/seabold.html

  50. [50]

    Neural Computation 9(8), 1735–1780 (1997) https://doi.org/10.1162/neco.1997.9.8.1735

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. [Online]. Available: https://doi.org/10.1162/neco.1997.9.8.1735

  51. [51]

    G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung,Time Series Analysis: Forecasting and Control, 5th ed. Hoboken, NJ: John Wiley & Sons, 2015. [Online]. Available: https://www.wiley.com/en-us/Time+Series+Analysis%3A+ Forecasting+and+Control%2C+5th+Edition-p-9781118675021

  52. [52]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems, vol. 30,

  53. [53]

    Available: https://proceedings.neurips.cc/paper files/ paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

    [Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  54. [54]

    Informer: Beyond efficient transformer for long sequence time-series forecasting,

    H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, 2021, pp. 11 106–11 115. [Online]. Available: https://doi.org/10.1609/aaai.v35i12.17325

  55. [55]

    TSMixer: An all-MLP architecture for time series forecasting,

    S.-A. Chen, C.-L. Li, N. Yoder, S. O. Arik, and T. Pfister, “TSMixer: An all-MLP architecture for time series forecasting,” 2023. [Online]. Available: https://arxiv.org/abs/2303.06053 APPENDIX MODELS’ CONFIGURATIONS ANDHYPERPARAMETERS This appendix reports the implementation details and hyper- parameter settings used to reproduce the main forecasting expe...