Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting
Pith reviewed 2026-06-26 20:54 UTC · model grok-4.3
The pith
A mixture-of-experts model fusing multiple pretrained forecasters delivers the strongest performance on influenza epidemic time series.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across influenza forecasting tasks, a mixture-of-experts model that fuses multiple pretrained forecasters achieves the strongest overall performance, indicating that heterogeneous pretrained representations provide complementary predictive information. Numerical transformer-based models produce reliable forecasts, while pretraining provides the largest gains at longer horizons particularly when the pretraining domain is mechanistically aligned with influenza dynamics. LLM-based time series methods underperform relative to numerical forecasters. Hospitalization information as both an auxiliary covariate and a pretraining source clarifies when additional surveillance streams enhance the robust
What carries the argument
mixture-of-experts model that fuses multiple pretrained forecasters to combine complementary representations from heterogeneous time series pretraining
If this is right
- Heterogeneous pretrained representations supply complementary predictive information that improves epidemic time series forecasts.
- Pretraining gains are largest at longer horizons when the source domain aligns mechanistically with influenza dynamics.
- Numerical transformer models remain reliable while LLM-based methods underperform on this class of structured count data.
- Hospitalization signals improve robustness when used as auxiliary covariates or pretraining sources in selected multi-horizon settings.
Where Pith is reading between the lines
- Fusion strategies may extend to forecasting other seasonal respiratory diseases whose dynamics share mechanistic features with influenza.
- Public-health model pipelines could shift toward ensembles of domain-aligned foundation models rather than single architectures.
- The observed complementarity implies that future pretraining corpora should deliberately sample from multiple epidemic mechanisms to maximize transfer.
Load-bearing premise
The chosen influenza-like illness surveillance and hospitalization time series under the stated temporal and spatial generalization settings are representative enough to support general conclusions about model architectures for epidemic forecasting.
What would settle it
A single model or non-mixture architecture achieving consistently lower error than the mixture-of-experts across all tasks, horizons, and both data types on the same surveillance series would falsify the superiority of the fused approach.
Figures
read the original abstract
Seasonal influenza infects millions of people and causes substantial morbidity and mortality in the United States each year, making accurate short-term forecasting a core public-health need. Reliable forecasts of epidemic time series can inform vaccination timing, hospital staffing, and resource allocation, yet the comparative behavior of modern forecasting architectures on infectious-disease surveillance data remains insufficiently characterized. We address this gap through a systematic evaluation of regional influenza forecasting using influenza-like illness surveillance and influenza-associated hospitalization time series under both temporal and spatial generalization settings for 1-4-week-ahead prediction. We compare classical neural network architectures, numerical transformer-based models, pretrained time series foundation models, and LLM-based forecasting approaches. Across tasks, we demonstrate that a mixture-of-experts model that fuses multiple pretrained forecasters achieves the strongest overall performance, indicating that heterogeneous pretrained representations provide complementary predictive information. Our results further show that numerical transformer-based models produce reliable forecasts, while pretraining provides the largest gains at longer horizons, particularly when the pretraining domain is mechanistically aligned with influenza dynamics. In contrast, LLM-based time series methods underperform relative to numerical forecasters in this setting. Finally, we examine hospitalization information as both an auxiliary covariate and a pretraining source. Hospitalization signals provide complementary improvements in selected settings and clarify when additional surveillance streams enhance the robustness of multi-horizon forecasting. These findings provide actionable guidance on model selection, pretraining strategy, and auxiliary-signal use for influenza preparedness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates classical neural networks, numerical transformers, pretrained time series foundation models, and LLM-based forecasters on U.S. regional influenza-like illness and hospitalization time series for 1-4-week-ahead prediction under temporal and spatial generalization. It reports that a mixture-of-experts model fusing multiple pretrained forecasters attains the strongest overall performance and interprets this as evidence that heterogeneous pretrained representations supply complementary predictive information. Additional claims concern the benefits of pretraining (especially domain-aligned) at longer horizons, the relative weakness of LLM-based methods, and the value of hospitalization signals as covariates or pretraining sources.
Significance. If the performance ordering and the complementarity interpretation are substantiated by appropriate controls, the work would supply actionable model-selection guidance for epidemic forecasting and clarify when pretraining and auxiliary streams improve multi-horizon robustness. The absence of such controls currently limits the strength of the headline claim.
major comments (1)
- [Abstract] Abstract: the claim that the MoE fusing multiple pretrained forecasters 'indicates that heterogeneous pretrained representations provide complementary predictive information' is not isolated by the reported experiments. No ablations are described that would distinguish heterogeneity from (a) the MoE routing mechanism itself, (b) ensemble size, or (c) the training procedure; controls such as an MoE built from repeated copies of a single pretrained model or a non-MoE ensemble (simple averaging or stacking) of the same forecasters are required to support the interpretation.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the MoE fusing multiple pretrained forecasters 'indicates that heterogeneous pretrained representations provide complementary predictive information' is not isolated by the reported experiments. No ablations are described that would distinguish heterogeneity from (a) the MoE routing mechanism itself, (b) ensemble size, or (c) the training procedure; controls such as an MoE built from repeated copies of a single pretrained model or a non-MoE ensemble (simple averaging or stacking) of the same forecasters are required to support the interpretation.
Authors: We agree that the reported experiments do not include the specific ablations needed to isolate the contribution of heterogeneous pretrained representations from the MoE routing mechanism, ensemble size, or training procedure. While the MoE outperforms the individual pretrained models and other forecasters, this does not fully distinguish the sources of improvement. We will revise the abstract to qualify the interpretive claim and will add the suggested controls (MoE with repeated copies of one model and non-MoE ensembles such as averaging or stacking) in the revised manuscript. revision: yes
Circularity Check
No circularity: empirical model comparisons rest on external benchmarks
full rationale
The paper conducts a systematic empirical evaluation of forecasting architectures on influenza-like illness and hospitalization time series under temporal and spatial generalization. The headline result (MoE fusing pretrained forecasters) is presented as an observed performance ordering across tasks, with no equations, fitted parameters renamed as predictions, or derivations that reduce to inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the central claims; all conclusions are tied to direct comparisons against held-out data. This is the standard non-circular case for an evaluation study.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Estimating influenza disease burden from population-based surveillance data in the United States,
C. Reed, S. S. Chaves, P. Daily Kirley, R. Emerson, D. Aragon, E. B. Hancock, L. Butler, J. Baumbach, G. Hollick, N. M. Bennettet al., “Estimating influenza disease burden from population-based surveillance data in the United States,”PLOS ONE, vol. 10, no. 3, p. e0118369,
-
[2]
Available: https://doi.org/10.1371/journal.pone.0118369
[Online]. Available: https://doi.org/10.1371/journal.pone.0118369
-
[3]
N. G. Reich, L. C. Brooks, S. J. Fox, S. Kandula, C. J. McGowan, E. Moore, D. Osthus, E. L. Ray, A. Tushar, T. K. Yamanaet al., “A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States,”Proceedings of the National Academy of Sciences, vol. 116, no. 8, pp. 3146–3154, 2019. [Online]. Available: https://doi.org...
-
[4]
The united states covid-19 forecast hub dataset,
E. Y . Cramer, Y . Huang, Y . Wang, E. L. Ray, M. Cornell, J. Bracher, A. Brennen, A. J. Castro Rivadeneira, A. Gerding, K. House, D. Jayawardena, A. H. Kanji, A. Khandelwal, K. Le, J. Niemi, A. Stark, A. Shah, N. Wattanachit, M. W. Zorn, N. G. Reich, and US COVID-19 Forecast Hub Consortium, “The united states covid-19 forecast hub dataset,”Scientific Dat...
-
[5]
Deep learning foundation and pattern models: Challenges in hydrological time series,
J. He, Y .-J. Chen, A. Jafari, A. Idamekorala, and G. Fox, “Deep learning foundation and pattern models: Challenges in hydrological time series,”The International Journal of High Performance Computing Applications, vol. 40, no. 1, pp. 22–41, 2026. [Online]. Available: https://doi.org/10.1177/10943420251380008
-
[6]
Chronos: Learning the language of time series,
A. F. Ansari, L. Stella, C. Turkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Rangapuram, S. P. Arango, S. Kapooret al., “Chronos: Learning the language of time series,” 2024. [Online]. Available: https://arxiv.org/abs/2403.07815
Pith/arXiv arXiv 2024
-
[7]
GCNET: Graph-based prediction of stock price movement using graph convolutional network,
A. Jafari and S. Haratizadeh, “GCNET: Graph-based prediction of stock price movement using graph convolutional network,”Engineering Applications of Artificial Intelligence, vol. 116, p. 105452, 2022. [Online]. Available: https://doi.org/10.1016/j.engappai.2022.105452 12
-
[8]
A. Jafari, G. Fox, J. B. Rundle, A. Donnellan, and L. G. Ludwig, “Time series foundation models and deep learning architectures for earthquake temporal and spatial nowcasting,”GeoHazards, vol. 5, no. 4, pp. 1247–1274, 2024. [Online]. Available: https: //doi.org/10.3390/geohazards5040059
-
[9]
NETpred: Network-based modeling and prediction of multiple connected market indices,
A. Jafari and S. Haratizadeh, “NETpred: Network-based modeling and prediction of multiple connected market indices,” 2022. [Online]. Available: https://arxiv.org/abs/2212.05916
arXiv 2022
-
[10]
COVID-Transformer: Interpretable COVID-19 detection using vision transformer for healthcare,
D. Shome, T. Kar, S. N. Mohanty, P. Tiwari, K. Muhammad, A. AlTameem, Y . Zhang, and A. K. J. Saudagar, “COVID-Transformer: Interpretable COVID-19 detection using vision transformer for healthcare,”International Journal of Environmental Research and Public Health, vol. 18, no. 21, p. 11086, 2021. [Online]. Available: https://doi.org/10.3390/ijerph182111086
-
[11]
M. K. Islam, Y . Liu, A. Erkelens, N. Daniello, A. Marathe, and J. Fox, “Interpreting county-level covid-19 infections using transformer and deep learning time series models,” in2023 IEEE International Conference on Digital Health (ICDH). IEEE, 2023, pp. 266–277. [Online]. Available: https://doi.org/10.1109/ICDH60066.2023.00046
-
[12]
TimesNet: Temporal 2D-variation modeling for general time series analysis,
H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long, “TimesNet: Temporal 2D-variation modeling for general time series analysis,”arXiv preprint arXiv:2210.02186, 2022, preprint version; prefer citing wu2023timesnet when possible. [Online]. Available: https://arxiv.org/abs/2210.02186
Pith/arXiv arXiv 2022
-
[13]
Time-LLM: Time series forecasting by reprogramming large language models,
M. Jin, S. Wang, L. Ma, Z. Chu, J. Y . Zhang, X. Shi, P.-Y . Chen, Y . Liang, Y .-F. Li, S. Pan, and Q. Wen, “Time-LLM: Time series forecasting by reprogramming large language models,” inInternational Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=Unb5CVPtae
2024
-
[14]
Collaborative efforts to forecast seasonal influenza in the United States, 2015–2016,
C. J. McGowan, M. Biggerstaff, M. Johansson, K. M. Apfeldorf, M. Ben-Nun, L. Brooks, M. Convertino, M. Erraguntla, D. C. Farrow, J. Freezeet al., “Collaborative efforts to forecast seasonal influenza in the United States, 2015–2016,”Scientific Reports, vol. 9, no. 1, p. 683,
2015
-
[15]
Available: https://doi.org/10.1038/s41598-018-36361-9
[Online]. Available: https://doi.org/10.1038/s41598-018-36361-9
-
[16]
Graph neural network for traffic forecasting: A survey,
W. Jiang and J. Luo, “Graph neural network for traffic forecasting: A survey,”Expert Systems with Applications, vol. 207, p. 117921, 2022. [Online]. Available: https://doi.org/10.1016/j.eswa.2022.117921
-
[17]
The predictive skill of convolutional neural networks models for disease forecasting,
K. Lee, J. Ray, and C. Safta, “The predictive skill of convolutional neural networks models for disease forecasting,”PLOS ONE, vol. 16, no. 7, p. e0254319, 2021. [Online]. Available: https: //doi.org/10.1371/journal.pone.0254319
-
[18]
CausalGNN: Causal-based graph neural networks for spatio-temporal epidemic forecasting,
L. Wang, A. Adiga, J. Chen, A. Sadilek, S. Venkatramanan, and M. Marathe, “CausalGNN: Causal-based graph neural networks for spatio-temporal epidemic forecasting,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 11, pp. 12 191–12 199,
-
[19]
Available: https://doi.org/10.1609/aaai.v36i11.21479
[Online]. Available: https://doi.org/10.1609/aaai.v36i11.21479
-
[20]
Enhancing deep traffic forecasting models with dynamic regression,
V . Z. Zheng, S. Choi, and L. Sun, “Enhancing deep traffic forecasting models with dynamic regression,” 2023. [Online]. Available: https://arxiv.org/abs/2301.06650
arXiv 2023
-
[21]
A comparison of infectious disease forecasting methods across locations, diseases, and time,
S. Dixon, R. Keshavamurthy, D. H. Farber, A. Stevens, K. T. Pazdernik, and L. E. Charles, “A comparison of infectious disease forecasting methods across locations, diseases, and time,”Pathogens, vol. 11, no. 2, p. 185, 2022. [Online]. Available: https://doi.org/10. 3390/pathogens11020185
2022
-
[22]
C. S. Lutz, M. P. Huynh, M. Schroeder, S. Anyatonwu, F. S. Dahlgren, G. Danyluk, D. Fernandez, S. K. Greene, N. Kipshidze, L. Liuet al., “Applying infectious disease forecasting to public health: A path forward using influenza forecasting examples,” BMC Public Health, vol. 19, p. 1659, 2019. [Online]. Available: https://doi.org/10.1186/s12889-019-7966-8
-
[23]
SEIR modeling of the COVID-19 and its dynamics,
S. He, Y . Peng, and K. Sun, “SEIR modeling of the COVID-19 and its dynamics,”Nonlinear Dynamics, vol. 101, pp. 1667–1680, 2020. [Online]. Available: https://doi.org/10.1007/s11071-020-05743-y
-
[24]
A simplicial epidemic model for COVID-19 spread analysis,
Y . Chen, Y . R. Gel, M. V . Marathe, and H. V . Poor, “A simplicial epidemic model for COVID-19 spread analysis,”Proceedings of the National Academy of Sciences, vol. 121, no. 1, p. e2313171120, 2024. [Online]. Available: https://doi.org/10.1073/pnas.2313171120
-
[25]
Informing university COVID-19 decisions using simple compartmental models,
B. Hurt, A. Adiga, M. Marathe, and C. L. Barrett, “Informing university COVID-19 decisions using simple compartmental models,” in2021 Winter Simulation Conference (WSC), 2021, pp. 1–12. [Online]. Available: https://doi.org/10.1109/WSC52266.2021.9715467
-
[26]
Rational evaluation of various epidemic models based on the COVID-19 data of China,
W. Yang, D. Zhang, L. Peng, C. Zhuge, and L. Hong, “Rational evaluation of various epidemic models based on the COVID-19 data of China,”Epidemics, vol. 37, p. 100501, 2021. [Online]. Available: https://doi.org/10.1016/j.epidem.2021.100501
-
[27]
R. Ospina, J. A. M. Gondim, V . Leiva, and C. Castro, “An overview of forecast analysis with ARIMA models during the COVID-19 pandemic: Methodology and case study in Brazil,”Mathematics, vol. 11, no. 14, p. 3069, 2023. [Online]. Available: https://doi.org/10.3390/math11143069
-
[28]
Prediction of global Omicron pandemic using ARIMA, MLR, and Prophet models,
D. Zhao, R. Zhang, H. Zhang, and S. He, “Prediction of global Omicron pandemic using ARIMA, MLR, and Prophet models,” Scientific Reports, vol. 12, p. 18138, 2022. [Online]. Available: https://doi.org/10.1038/s41598-022-23154-4
-
[29]
A. Adiga, G. Kaur, L. Wang, B. Hurt, P. Porebski, S. Venkatramanan, B. Lewis, and M. V . Marathe, “Phase-informed bayesian ensemble models improve performance of covid-19 forecasts,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 13, pp. 15 647–15 653, 2024. [Online]. Available: https://doi.org/10.1609/aaai. v37i13.26855
-
[30]
Cola-GNN: Cross-location attention based graph neural networks for long-term ILI prediction,
S. Deng, S. Wang, H. Rangwala, L. Wang, and Y . Ning, “Cola-GNN: Cross-location attention based graph neural networks for long-term ILI prediction,” inProceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 245–254. [Online]. Available: https://doi.org/10.1145/3340531.3411975
-
[31]
EpiGNN: Exploring spatial transmission with graph neural network for regional epidemic forecasting,
F. Xie, Z. Zhang, L. Li, B. Zhou, and Y . Tan, “EpiGNN: Exploring spatial transmission with graph neural network for regional epidemic forecasting,” inMachine Learning and Knowledge Discovery in Databases: ECML PKDD 2022, ser. Lecture Notes in Computer Science, vol. 13718. Springer, 2023, pp. 469–485. [Online]. Available: https://doi.org/10.1007/978-3-031...
-
[32]
M. Liu, Y . Liu, and J. Liu, “Epidemiology-aware deep learning for infectious disease dynamics prediction,” inProceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 4084–4088. [Online]. Available: https: //doi.org/10.1145/3583780.3615139
-
[33]
RESEAT: Recurrent self-attention network for multi-regional influenza forecasting,
J. Moon, S. Jung, S. Park, and E. Hwang, “RESEAT: Recurrent self-attention network for multi-regional influenza forecasting,”IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 5, pp. 2585–2596, 2023. [Online]. Available: https://doi.org/10.1109/JBHI. 2023.3247687
-
[34]
Self-attention-based deep learning network for regional influenza forecasting,
S. Jung, J. Moon, S. Park, and E. Hwang, “Self-attention-based deep learning network for regional influenza forecasting,”IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 2, pp. 922–933, 2022. [Online]. Available: https://doi.org/10.1109/JBHI.2021.3093897
-
[35]
An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,
S. Bai, J. Z. Kolter, and V . Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” 2018. [Online]. Available: https://arxiv.org/abs/1803.01271
Pith/arXiv arXiv 2018
-
[36]
Long-term forecasting with TiDE: Time-series dense encoder,
A. Das, W. Kong, A. Leach, S. Mathur, R. Sen, and R. Yu, “Long-term forecasting with TiDE: Time-series dense encoder,” Transactions on Machine Learning Research, 2023. [Online]. Available: https://openreview.net/forum?id=pCbC3aQB5W
2023
-
[37]
A time series is worth 64 words: Long-term forecasting with transformers,
Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=Jbdc0vTOcol
2023
-
[38]
iTransformer: Inverted transformers are effective for time series forecasting,
Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “iTransformer: Inverted transformers are effective for time series forecasting,” inInternational Conference on Learning Representations,
-
[39]
Available: https://openreview.net/forum?id=JePfAI8fah
[Online]. Available: https://openreview.net/forum?id=JePfAI8fah
-
[40]
Temporal fusion transformers for interpretable multi-horizon time series forecasting,
B. Lim, S. O. Arik, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,” International Journal of Forecasting, vol. 37, no. 4, pp. 1748–1764,
-
[41]
Available: https://doi.org/10.1016/j.ijforecast.2021.03
[Online]. Available: https://doi.org/10.1016/j.ijforecast.2021.03. 012
-
[42]
FluSight: Forecasts of flu hospital admissions,
Centers for Disease Control and Prevention, “FluSight: Forecasts of flu hospital admissions,” Online, 2023, accessed: 2026-06-
2023
-
[43]
Available: https://www.cdc.gov/flu-forecasting/data-vis/ current-week.html
[Online]. Available: https://www.cdc.gov/flu-forecasting/data-vis/ current-week.html
-
[44]
Monash time series forecasting archive,
R. Godahewa, C. Bergmeir, G. I. Webb, R. J. Hyndman, and P. Montero-Manso, “Monash time series forecasting archive,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. [Online]. Available: https: //openreview.net/forum?id=I01l7rc0jcb
2021
-
[45]
doi: https://doi.org/10.1016/j.ijforecast.2019.04.014
S. Makridakis, E. Spiliotis, and V . Assimakopoulos, “The m4 competition: 100,000 time series and 61 forecasting methods,” International Journal of Forecasting, vol. 36, no. 1, pp. 54–74, 2020. [Online]. Available: https://doi.org/10.1016/j.ijforecast.2019.04.014
-
[46]
Application of a normalized Nash– Sutcliffe efficiency to improve the accuracy of the Sobol’ sensitivity analysis of a hydrological model,
J. Nossent and W. Bauwens, “Application of a normalized Nash– Sutcliffe efficiency to improve the accuracy of the Sobol’ sensitivity analysis of a hydrological model,” inEGU General Assembly Conference Abstracts, vol. 14, 2012, p. 237. [Online]. Available: https://meetingorganizer.copernicus.org/EGU2012/EGU2012-237.pdf
2012
-
[47]
Position: Temporal measurement interval determines computational and model complexity 13 in single-cell perturbation analysis,
A. Jafari, H. Shakeri, and H. Daneshmand, “Position: Temporal measurement interval determines computational and model complexity 13 in single-cell perturbation analysis,” inProceedings of the 43rd International Conference on Machine Learning, 2026, spotlight position paper. [Online]. Available: https://openreview.net/forum?id= lECKpTE1lW
2026
-
[48]
NeuralForecast: User-friendly state-of-the-art neural forecasting models,
K. G. Olivares, C. Challu, F. Garza, M. Mergenthaler Canseco, and A. Dubrawski, “NeuralForecast: User-friendly state-of-the-art neural forecasting models,” PyCon Salt Lake City, Utah, US, 2022. [Online]. Available: https://github.com/Nixtla/neuralforecast
2022
-
[49]
Statsmodels: Econometric and statistical modeling with Python,
S. Seabold and J. Perktold, “Statsmodels: Econometric and statistical modeling with Python,” inProceedings of the 9th Python in Science Conference, Austin, TX, 2010, pp. 92–96. [Online]. Available: https://conference.scipy.org/proceedings/scipy2010/seabold.html
2010
-
[50]
Neural Computation 9(8), 1735–1780 (1997) https://doi.org/10.1162/neco.1997.9.8.1735
S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. [Online]. Available: https://doi.org/10.1162/neco.1997.9.8.1735
-
[51]
G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung,Time Series Analysis: Forecasting and Control, 5th ed. Hoboken, NJ: John Wiley & Sons, 2015. [Online]. Available: https://www.wiley.com/en-us/Time+Series+Analysis%3A+ Forecasting+and+Control%2C+5th+Edition-p-9781118675021
2015
-
[52]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems, vol. 30,
-
[53]
Available: https://proceedings.neurips.cc/paper files/ paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
2017
-
[54]
Informer: Beyond efficient transformer for long sequence time-series forecasting,
H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, 2021, pp. 11 106–11 115. [Online]. Available: https://doi.org/10.1609/aaai.v35i12.17325
-
[55]
TSMixer: An all-MLP architecture for time series forecasting,
S.-A. Chen, C.-L. Li, N. Yoder, S. O. Arik, and T. Pfister, “TSMixer: An all-MLP architecture for time series forecasting,” 2023. [Online]. Available: https://arxiv.org/abs/2303.06053 APPENDIX MODELS’ CONFIGURATIONS ANDHYPERPARAMETERS This appendix reports the implementation details and hyper- parameter settings used to reproduce the main forecasting expe...
arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.