pith. sign in

arxiv: 2411.08249 · v2 · submitted 2024-11-12 · 💻 cs.LG · cs.AI

Retrieval Augmented Time Series Forecasting

Pith reviewed 2026-05-23 16:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords retrieval augmented forecastingtime series foundation modelszero-shot forecastingRAG for time seriesforecast accuracyChronos
0
0 comments X

The pith

Retrieving similar past time series and feeding them into foundation models raises zero-shot forecasting accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Time-series foundation models struggle with zero-shot forecasting on dynamic, event-driven data that may lie outside their training distribution. The paper asks whether retrieval-augmented generation, already useful for language models, can be adapted to supply relevant past examples and improve predictions. It introduces Retrieval Augmented Forecasting (RAF) together with concrete retrieval and incorporation strategies. Experiments across domains show accuracy gains that grow larger as the underlying foundation model size increases. A reader would care because this offers a practical way to boost performance without retraining or enlarging the base model.

Core claim

Retrieval Augmented Forecasting (RAF) is a framework that retrieves related time-series examples and incorporates them into the input of time-series foundation models; this procedure improves forecasting accuracy across diverse domains, and the gains become larger for bigger TSFM sizes.

What carries the argument

Retrieval Augmented Forecasting (RAF) framework, which selects related time-series examples and augments the model input with them.

If this is right

  • RAF delivers measurable accuracy lifts on many different time-series domains.
  • The accuracy improvement scales up with the size of the underlying time-series foundation model.
  • The approach directly targets the dynamic and event-driven character of time-series data.
  • It provides a route to stronger zero-shot forecasting without model retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same retrieval step could be run online so that the database grows with newly observed series.
  • RAF might mitigate concept drift by preferentially retrieving recent matching examples.
  • Smaller foundation models augmented by RAF could reach performance levels that currently require much larger models.

Load-bearing premise

The retrieved time-series examples are relevant and non-noisy enough that adding them raises accuracy instead of introducing harmful context or distribution shift.

What would settle it

A controlled test in which deliberately irrelevant or noisy retrieved series are supplied and forecast error rises above the no-retrieval baseline.

Figures

Figures reproduced from arXiv: 2411.08249 by Ege Onur Taga, Kutay Tire, Muhammed Emrullah Ildiz, Samet Oymak.

Figure 1
Figure 1. Figure 1: Overview of the Retrieval Augmented Forecasting (RAF) framework. Top left: The original query is used to retrieve the best-matching time series (RTS 1, RTS 2, RTS 3, . . . ). Bottom left: We utilize the best match (RTS 1) to form the retrieved context and retrieved future. Bottom right: These segments are then augmented with the original time series to produce an augmented input for forecasting. Top right … view at source ↗
Figure 2
Figure 2. Figure 2: We generated synthetic time-series data by transposing two sinusoidal signals and project￾ing them via orthogonal projections. We assessed extrapolation behavior using scaled mean squared error (assuming 0 prediction as baseline) and chose a context and forecast length of C = 30 and H = 30. Evaluations were conducted on Chronos- {mini, small, base}. The TS-R task is inspired in part from the as￾sociative r… view at source ↗
Figure 3
Figure 3. Figure 3: Aggregated Relative WQL performance for Chronos Mini and Chronos Base across [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Aggregated Relative MASE performance for Chronos Mini and Chronos Base across [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results for Benchmark I datasets with [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results for Benchmark II datasets with [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative results for Benchmark I datasets with [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative results for Benchmark II datasets with [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
read the original abstract

Retrieval-augmented generation (RAG) is a central component of modern LLM systems, particularly in scenarios where up-to-date information is crucial for accurately responding to user queries or when queries exceed the scope of the training data. The advent of time-series foundation models (TSFM), such as Chronos, and the need for effective zero-shot forecasting performance across various time-series domains motivates the question: Do benefits of RAG similarly carry over to time series forecasting? In this paper, we advocate that the dynamic and event-driven nature of time-series data makes RAG a crucial component of TSFMs and introduce a principled RAG framework for time-series forecasting, called Retrieval Augmented Forecasting (RAF). Within RAF, we develop efficient strategies for retrieving related time-series examples and incorporating them into forecast. Through experiments and mechanistic studies, we demonstrate that RAF indeed improves the forecasting accuracy across diverse time series domains and the improvement is more significant for larger TSFM sizes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Retrieval Augmented Forecasting (RAF), a RAG framework for time-series foundation models (TSFMs) such as Chronos. It develops retrieval strategies for related time-series examples and their incorporation into zero-shot forecasts, claiming via experiments and mechanistic studies that RAF improves accuracy across diverse domains with larger gains for bigger TSFM sizes.

Significance. If the results hold with proper verification of retrieval quality, the work would be significant for extending RAG benefits to non-stationary time-series forecasting and highlighting scale-dependent advantages in TSFMs.

major comments (3)
  1. [Abstract] Abstract: the central claim that RAF improves accuracy (and more for larger TSFMs) rests on unverified retrieval quality, yet the abstract supplies no information on baselines, datasets, statistical significance, or controls for retrieval failure modes such as distribution shift from non-stationary mismatched series.
  2. [Experiments] Experiments section: without explicit ablations or tests injecting noisy/irrelevant retrieved examples (e.g., via perturbed similarity metrics), gains cannot be attributed to RAF rather than input length or prompting artifacts, undermining the attribution to relevant context.
  3. [Mechanistic studies] Mechanistic studies: these must demonstrate that larger models better exploit retrieved patterns without overfitting noise; absent such controls, the scale-dependent improvement claim lacks support given time-series event-driven variability.
minor comments (2)
  1. Clarify the exact similarity metric and incorporation method (e.g., concatenation vs. attention) in the RAF framework description.
  2. Add missing references to prior RAG work in LLMs and existing TSFM baselines for context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to strengthen the presentation of experimental details, attribution of gains, and mechanistic analysis. We address each major comment below and commit to revisions that incorporate additional controls and clarifications without misrepresenting our existing results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that RAF improves accuracy (and more for larger TSFMs) rests on unverified retrieval quality, yet the abstract supplies no information on baselines, datasets, statistical significance, or controls for retrieval failure modes such as distribution shift from non-stationary mismatched series.

    Authors: We agree the abstract is concise and would benefit from additional context. In the revision, we will expand it to briefly note the datasets (multi-domain TSFM benchmarks), baselines (zero-shot TSFM forecasts), statistical significance of improvements, and mention of retrieval quality controls (e.g., similarity thresholds and failure mode checks) already present in the main text and appendix. This will better support the central claim without altering its substance. revision: yes

  2. Referee: [Experiments] Experiments section: without explicit ablations or tests injecting noisy/irrelevant retrieved examples (e.g., via perturbed similarity metrics), gains cannot be attributed to RAF rather than input length or prompting artifacts, undermining the attribution to relevant context.

    Authors: This is a valid point; our current experiments include relevant vs. zero-shot comparisons but lack explicit noise-injection ablations. We will add these in the revised experiments section, including tests with perturbed similarity metrics and random/irrelevant retrieval to show performance degradation and confirm attribution to relevant context rather than length or prompting effects. revision: yes

  3. Referee: [Mechanistic studies] Mechanistic studies: these must demonstrate that larger models better exploit retrieved patterns without overfitting noise; absent such controls, the scale-dependent improvement claim lacks support given time-series event-driven variability.

    Authors: We acknowledge the need for stronger controls here. The existing mechanistic analysis shows scaling trends and attention patterns, but to directly test exploitation without noise overfitting, we will augment the section with comparisons of relevant vs. irrelevant retrieval across model sizes and analysis of how larger models discriminate patterns amid event-driven variability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework validated by experiments, no derivations or self-referential fits.

full rationale

The paper proposes Retrieval Augmented Forecasting (RAF) as a practical framework for incorporating retrieved time-series examples into TSFM inference. All central claims of accuracy improvement are presented as outcomes of experiments and mechanistic studies across domains, with no equations, parameter fits, or derivations that reduce the reported gains to quantities defined by the same inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes; the work contains no mathematical derivation chain at all. The reader's assessment of score 2.0 is consistent with an honest non-finding for an empirical contribution whose soundness depends on experimental controls rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review performed on abstract only; no details on free parameters, axioms, or invented entities are available.

invented entities (1)
  • RAF framework no independent evidence
    purpose: Augment time-series foundation models with retrieved similar series
    Introduced in the abstract as the central contribution

pith-pipeline@v0.9.0 · 5699 in / 1115 out tokens · 25473 ms · 2026-05-23T16:55:31.907874+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 1 internal anchor

  1. [1]

    Maddix, Hao Wang, Michael W

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language ...

  2. [2]

    George Athanasopoulos, Rob Hyndman, Haiyan Song, and Doris C. Wu. The tourism forecasting competition. International Journal of Forecasting, 27(3):822–844, 2011

  3. [3]

    Meme suite: tools for motif discovery and searching.Nucleic acids research, 37(suppl_2):W202–W208, 2009

    Timothy L Bailey, Mikael Boden, Fabian A Buske, Martin Frith, Charles E Grant, Luca Clementi, Jingyuan Ren, Wilfred W Li, and William S Noble. Meme suite: tools for motif discovery and searching.Nucleic acids research, 37(suppl_2):W202–W208, 2009

  4. [4]

    Improving language models by retrieving from trillions of tokens

    Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego De Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, ...

  5. [5]

    Arik, and Tomas Pfister

    Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, and Tomas Pfister. Tsmixer: An all-mlp architecture for time series forecasting, 2023

  6. [6]

    Forecastpfn: Synthetically-trained zero-shot forecasting

    Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha V Naidu, and Colin White. Forecastpfn: Synthetically-trained zero-shot forecasting. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 2403–2426. Curran Associates, Inc., 2023

  7. [7]

    Adarnn: Adaptive learning and forecasting of time series, 2021

    Yuntao Du, Jindong Wang, Wenjie Feng, Sinno Pan, Tao Qin, Renjun Xu, and Chongjun Wang. Adarnn: Adaptive learning and forecasting of time series, 2021

  8. [8]

    Augmenting transformers with knn-based composite memory for dialog.Transactions of the Association for Computational Linguistics, 9:82–99, 2021

    Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. Augmenting transformers with knn-based composite memory for dialog.Transactions of the Association for Computational Linguistics, 9:82–99, 2021

  9. [9]

    Timegpt-1, 2024

    Azul Garza, Cristian Challu, and Max Mergenthaler-Canseco. Timegpt-1, 2024

  10. [10]

    Webb, Rob J

    Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive, 2021

  11. [11]

    Mamba: Linear-time sequence modeling with selective state spaces, 2024

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2024

  12. [12]

    Realm: Retrieval-augmented language model pre-training, 2020

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. Realm: Retrieval-augmented language model pre-training, 2020

  13. [13]

    Hyndman and Anne B

    Rob J. Hyndman and Anne B. Koehler. Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4):679–688, 2006

  14. [14]

    Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

    Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering.arXiv preprint arXiv:2007.01282, 2020

  15. [15]

    Domain adaptation for time series forecasting via attention sharing

    Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time series forecasting via attention sharing. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learn...

  16. [16]

    Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models, 2020

  17. [17]

    Baleen: Robust multi-hop reasoning at scale via condensed retrieval, 2022

    Omar Khattab, Christopher Potts, and Matei Zaharia. Baleen: Robust multi-hop reasoning at scale via condensed retrieval, 2022

  18. [18]

    Colbert: Efficient and effective passage search via con- textualized late interaction over bert

    Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via con- textualized late interaction over bert. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020. 12

  19. [19]

    Reversible instance normalization for accurate time-series forecasting against distribution shift

    Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022

  20. [20]

    Alphacode 2 technical report

    Lemi Leblond et al. Alphacode 2 technical report. Technical report, DeepMind, 2023

  21. [21]

    Latent retrieval for weakly supervised open domain question answering, 2019

    Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly supervised open domain question answering, 2019

  22. [22]

    Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021

  23. [23]

    A survey on retrieval-augmented text generation

    Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110, 2022

  24. [24]

    Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting

    Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran As...

  25. [25]

    Foundation models for time series analysis: A tutorial and survey

    Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. Foundation models for time series analysis: A tutorial and survey. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, volume 619 ofKDD ’24, page 6555–6565. ACM, August 2024

  26. [26]

    Arik, Nicolas Loeff, and Tomas Pfister

    Bryan Lim, Sercan O. Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International Journal of Forecasting, 37(4):1748–1764, 2021

  27. [27]

    Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting

    Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International Conference on Learning Representations, 2022

  28. [28]

    itransformer: Inverted transformers are effective for time series forecasting, 2024

    Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting, 2024

  29. [29]

    Query rewrit- ing for retrieval-augmented large language models,

    Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. Query rewriting for retrieval-augmented large language models.arXiv preprint arXiv:2305.14283, 2023

  30. [30]

    Accuracy of forecasting: An empirical investigation

    Spyros Makridakis, Michèle Hibon, and Claus Moser. Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society. Series A (General), 142(2):97–145, 1979

  31. [31]

    Dynamic time warping.Information retrieval for music and motion, pages 69–84, 2007

    Meinard Müller. Dynamic time warping.Information retrieval for music and motion, pages 69–84, 2007

  32. [32]

    Müller, N

    Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference.arXiv preprint arXiv:2112.10510, 2021

  33. [33]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023. 13

  34. [34]

    A time series is worth 64 words: Long-term forecasting with transformers

    Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023

  35. [35]

    Can generalist foundation models outcompete special-purpose tuning? case study in medicine, 2023

    Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, and Eric Horvitz. Can generalist foundation models outcompete special-purpose tuning? case study in medicine, 2023

  36. [36]

    In-context learning and induction heads, 2022

    Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

  37. [37]

    Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

    Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. Meta-learning framework with applications to zero-shot time-series forecasting, 2020

  38. [38]

    Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks, 2020

    Bernardo Pérez Orozco and Stephen J Roberts. Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks, 2020

  39. [39]

    Anjos, Sebastian Lautz, and Aleksandar Kolev

    Egon Persak, Miguel F. Anjos, Sebastian Lautz, and Aleksandar Kolev. Multiple-resolution tokenization for time series forecasting with an application to pricing, 2024

  40. [40]

    Language models are unsupervised multitask learners, 2019

    Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019

  41. [41]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023

  42. [42]

    Lag-llama: Towards foundation models for probabilistic time series forecasting, 2024

    Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models ...

  43. [43]

    Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting, 2021

    Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting, 2021

  44. [44]

    Agentic retrieval- augmented generation for time series analysis.arXiv preprint arXiv:2408.14484, 2024

    Chidaksh Ravuru, Sagar Srinivas Sakhinana, and Venkataramana Runkana. Agentic retrieval- augmented generation for time series analysis.arXiv preprint arXiv:2408.14484, 2024

  45. [45]

    Deepar: Probabilis- tic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3):1181–1191, 2020

    David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilis- tic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3):1181–1191, 2020

  46. [46]

    Retrieval-augmented mining of temporal logic specifications from data

    Gaia Saveri and Luca Bortolussi. Retrieval-augmented mining of temporal logic specifications from data. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 315–331. Springer, 2024. 14

  47. [47]

    Roformer: Enhanced transformer with rotary position embedding, 2023

    Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2023

  48. [48]

    Timepfn: Effective multivariate time series forecasting with synthetic data

    Ege Onur Taga, Muhammed Emrullah Ildiz, and Samet Oymak. Timepfn: Effective multivariate time series forecasting with synthetic data. InNeurIPS Workshop on Time Series in the Age of Large Models, 2024

  49. [49]

    Totem: Tokenized time series embeddings for general time series analysis, 2024

    Sabera Talukder, Yisong Yue, and Georgia Gkioxari. Totem: Tokenized time series embeddings for general time series analysis, 2024

  50. [50]

    Instance normalization: The missing ingredient for fast stylization, 2017

    Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization, 2017

  51. [51]

    Ratsf: Empowering customer service volume management through retrieval-augmented time-series forecasting.arXiv preprint arXiv:2403.04180, 2024

    Tianfeng Wang and Gaojie Cui. Ratsf: Empowering customer service volume management through retrieval-augmented time-series forecasting.arXiv preprint arXiv:2403.04180, 2024

  52. [52]

    Deep factors for forecasting

    Yuyang Wang, Alex Smola, Danielle Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski. Deep factors for forecasting. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 6607–6617. PMLR, 09–15 Jun 2019

  53. [53]

    Unified training of universal time series forecasting transformers, 2024

    Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers, 2024

  54. [54]

    Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. InAdvances in Neural Information Processing Systems, 2021

  55. [55]

    Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets

    Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM), pages 1317–1322. Ieee, 2016

  56. [56]

    Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

    Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. InInternational Conference on Learning Representations, 2023

  57. [57]

    Informer: Beyond efficient transformer for long sequence time-series forecasting

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, volume 35, pages 11106–11115. AAAI Press, 2021

  58. [58]

    FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. InProc. 39th International Conference on Machine Learning (ICML 2022), 2022. 15 A Theoretical Results A.1 TS-R Problem In Section 2, we asserted that a two layer transformer architecture can solve th...

  59. [59]

    16 Proof 1 Realize that withW1, the positional encodings will be ignored and only token embeddings will remain

    Set W2 = c.Φ⊥RΦ⊥ As c → ∞, we haveg(f2(f1(xL−C+1))) → Υ. 16 Proof 1 Realize that withW1, the positional encodings will be ignored and only token embeddings will remain. Moreover, N XtrΦ xL−C+1 ∥xL−C+1∥ℓ2 will return a vector v of size L − C with the largest element at j, for index j, corresponding to the matching retrieval motif. Asc → ∞, the softmax will...

  60. [60]

    NN5 dataset ([10]) consists of 111 daily time series of cash withdrawals from Automated Teller Machines (ATMs) in the UK, and was utilized in the NN5 forecasting competition

    The data was sourced from the Johns Hopkins repository. NN5 dataset ([10]) consists of 111 daily time series of cash withdrawals from Automated Teller Machines (ATMs) in the UK, and was utilized in the NN5 forecasting competition. E.2 Benchmark II Datasets Tourism dataset ([10, 2]), derived from a Kaggle competition, includes 366 monthly and 427 quarterly...