Retrieval Augmented Time Series Forecasting

Ege Onur Taga; Kutay Tire; Muhammed Emrullah Ildiz; Samet Oymak

arxiv: 2411.08249 · v2 · submitted 2024-11-12 · 💻 cs.LG · cs.AI

Retrieval Augmented Time Series Forecasting

Kutay Tire , Ege Onur Taga , Muhammed Emrullah Ildiz , Samet Oymak This is my paper

Pith reviewed 2026-05-23 16:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords retrieval augmented forecastingtime series foundation modelszero-shot forecastingRAG for time seriesforecast accuracyChronos

0 comments

The pith

Retrieving similar past time series and feeding them into foundation models raises zero-shot forecasting accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Time-series foundation models struggle with zero-shot forecasting on dynamic, event-driven data that may lie outside their training distribution. The paper asks whether retrieval-augmented generation, already useful for language models, can be adapted to supply relevant past examples and improve predictions. It introduces Retrieval Augmented Forecasting (RAF) together with concrete retrieval and incorporation strategies. Experiments across domains show accuracy gains that grow larger as the underlying foundation model size increases. A reader would care because this offers a practical way to boost performance without retraining or enlarging the base model.

Core claim

Retrieval Augmented Forecasting (RAF) is a framework that retrieves related time-series examples and incorporates them into the input of time-series foundation models; this procedure improves forecasting accuracy across diverse domains, and the gains become larger for bigger TSFM sizes.

What carries the argument

Retrieval Augmented Forecasting (RAF) framework, which selects related time-series examples and augments the model input with them.

If this is right

RAF delivers measurable accuracy lifts on many different time-series domains.
The accuracy improvement scales up with the size of the underlying time-series foundation model.
The approach directly targets the dynamic and event-driven character of time-series data.
It provides a route to stronger zero-shot forecasting without model retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same retrieval step could be run online so that the database grows with newly observed series.
RAF might mitigate concept drift by preferentially retrieving recent matching examples.
Smaller foundation models augmented by RAF could reach performance levels that currently require much larger models.

Load-bearing premise

The retrieved time-series examples are relevant and non-noisy enough that adding them raises accuracy instead of introducing harmful context or distribution shift.

What would settle it

A controlled test in which deliberately irrelevant or noisy retrieved series are supplied and forecast error rises above the no-retrieval baseline.

Figures

Figures reproduced from arXiv: 2411.08249 by Ege Onur Taga, Kutay Tire, Muhammed Emrullah Ildiz, Samet Oymak.

**Figure 1.** Figure 1: Overview of the Retrieval Augmented Forecasting (RAF) framework. Top left: The original query is used to retrieve the best-matching time series (RTS 1, RTS 2, RTS 3, . . . ). Bottom left: We utilize the best match (RTS 1) to form the retrieved context and retrieved future. Bottom right: These segments are then augmented with the original time series to produce an augmented input for forecasting. Top right … view at source ↗

**Figure 2.** Figure 2: We generated synthetic time-series data by transposing two sinusoidal signals and projecting them via orthogonal projections. We assessed extrapolation behavior using scaled mean squared error (assuming 0 prediction as baseline) and chose a context and forecast length of C = 30 and H = 30. Evaluations were conducted on Chronos- {mini, small, base}. The TS-R task is inspired in part from the associative r… view at source ↗

**Figure 3.** Figure 3: Aggregated Relative WQL performance for Chronos Mini and Chronos Base across [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Aggregated Relative MASE performance for Chronos Mini and Chronos Base across [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results for Benchmark I datasets with [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative results for Benchmark II datasets with [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative results for Benchmark I datasets with [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative results for Benchmark II datasets with [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) is a central component of modern LLM systems, particularly in scenarios where up-to-date information is crucial for accurately responding to user queries or when queries exceed the scope of the training data. The advent of time-series foundation models (TSFM), such as Chronos, and the need for effective zero-shot forecasting performance across various time-series domains motivates the question: Do benefits of RAG similarly carry over to time series forecasting? In this paper, we advocate that the dynamic and event-driven nature of time-series data makes RAG a crucial component of TSFMs and introduce a principled RAG framework for time-series forecasting, called Retrieval Augmented Forecasting (RAF). Within RAF, we develop efficient strategies for retrieving related time-series examples and incorporating them into forecast. Through experiments and mechanistic studies, we demonstrate that RAF indeed improves the forecasting accuracy across diverse time series domains and the improvement is more significant for larger TSFM sizes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAF applies retrieval to time-series foundation models and claims bigger gains for larger ones, but the evidence hinges on untested retrieval quality.

read the letter

The main point is that this paper introduces Retrieval Augmented Forecasting (RAF), a framework that pulls in similar past time series to improve zero-shot forecasts from models like Chronos. It argues this helps because time series are event-driven and often fall outside training distributions, and it reports larger benefits as model size grows. That is the core new idea: treating retrieval as a standard component for TSFMs rather than an afterthought. The motivation lines up with real use cases where retraining per domain is impractical. The paper does a reasonable job laying out why RAG-style methods might transfer and sketches retrieval strategies plus ways to feed the examples into the forecast step. Those pieces are concrete enough to build on. The soft spot is exactly the one the stress-test flags. The abstract states that experiments and mechanistic studies back the accuracy gains, yet supplies no numbers on baselines, datasets, statistical tests, or checks that the retrieved series actually help rather than add noise or shift. Time series similarity is easy to get wrong when patterns are non-stationary, and larger models could latch onto misleading context. Without explicit controls or ablations on retrieval failure, the claimed improvements cannot be confidently attributed to RAF. If the full paper contains those controls and shows they hold across domains, the result strengthens; on the abstract alone the link remains the weakest part. This work is aimed at researchers building or applying time-series foundation models who need better zero-shot behavior. Anyone already experimenting with retrieval in other modalities will find the adaptation straightforward to evaluate. It is worth sending to peer review because the idea is timely and the framework is simple enough to test quickly, even if the current evidence needs tightening on the retrieval-quality question.

Referee Report

3 major / 2 minor

Summary. The paper introduces Retrieval Augmented Forecasting (RAF), a RAG framework for time-series foundation models (TSFMs) such as Chronos. It develops retrieval strategies for related time-series examples and their incorporation into zero-shot forecasts, claiming via experiments and mechanistic studies that RAF improves accuracy across diverse domains with larger gains for bigger TSFM sizes.

Significance. If the results hold with proper verification of retrieval quality, the work would be significant for extending RAG benefits to non-stationary time-series forecasting and highlighting scale-dependent advantages in TSFMs.

major comments (3)

[Abstract] Abstract: the central claim that RAF improves accuracy (and more for larger TSFMs) rests on unverified retrieval quality, yet the abstract supplies no information on baselines, datasets, statistical significance, or controls for retrieval failure modes such as distribution shift from non-stationary mismatched series.
[Experiments] Experiments section: without explicit ablations or tests injecting noisy/irrelevant retrieved examples (e.g., via perturbed similarity metrics), gains cannot be attributed to RAF rather than input length or prompting artifacts, undermining the attribution to relevant context.
[Mechanistic studies] Mechanistic studies: these must demonstrate that larger models better exploit retrieved patterns without overfitting noise; absent such controls, the scale-dependent improvement claim lacks support given time-series event-driven variability.

minor comments (2)

Clarify the exact similarity metric and incorporation method (e.g., concatenation vs. attention) in the RAF framework description.
Add missing references to prior RAG work in LLMs and existing TSFM baselines for context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to strengthen the presentation of experimental details, attribution of gains, and mechanistic analysis. We address each major comment below and commit to revisions that incorporate additional controls and clarifications without misrepresenting our existing results.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that RAF improves accuracy (and more for larger TSFMs) rests on unverified retrieval quality, yet the abstract supplies no information on baselines, datasets, statistical significance, or controls for retrieval failure modes such as distribution shift from non-stationary mismatched series.

Authors: We agree the abstract is concise and would benefit from additional context. In the revision, we will expand it to briefly note the datasets (multi-domain TSFM benchmarks), baselines (zero-shot TSFM forecasts), statistical significance of improvements, and mention of retrieval quality controls (e.g., similarity thresholds and failure mode checks) already present in the main text and appendix. This will better support the central claim without altering its substance. revision: yes
Referee: [Experiments] Experiments section: without explicit ablations or tests injecting noisy/irrelevant retrieved examples (e.g., via perturbed similarity metrics), gains cannot be attributed to RAF rather than input length or prompting artifacts, undermining the attribution to relevant context.

Authors: This is a valid point; our current experiments include relevant vs. zero-shot comparisons but lack explicit noise-injection ablations. We will add these in the revised experiments section, including tests with perturbed similarity metrics and random/irrelevant retrieval to show performance degradation and confirm attribution to relevant context rather than length or prompting effects. revision: yes
Referee: [Mechanistic studies] Mechanistic studies: these must demonstrate that larger models better exploit retrieved patterns without overfitting noise; absent such controls, the scale-dependent improvement claim lacks support given time-series event-driven variability.

Authors: We acknowledge the need for stronger controls here. The existing mechanistic analysis shows scaling trends and attention patterns, but to directly test exploitation without noise overfitting, we will augment the section with comparisons of relevant vs. irrelevant retrieval across model sizes and analysis of how larger models discriminate patterns amid event-driven variability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework validated by experiments, no derivations or self-referential fits.

full rationale

The paper proposes Retrieval Augmented Forecasting (RAF) as a practical framework for incorporating retrieved time-series examples into TSFM inference. All central claims of accuracy improvement are presented as outcomes of experiments and mechanistic studies across domains, with no equations, parameter fits, or derivations that reduce the reported gains to quantities defined by the same inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes; the work contains no mathematical derivation chain at all. The reader's assessment of score 2.0 is consistent with an honest non-finding for an empirical contribution whose soundness depends on experimental controls rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review performed on abstract only; no details on free parameters, axioms, or invented entities are available.

invented entities (1)

RAF framework no independent evidence
purpose: Augment time-series foundation models with retrieved similar series
Introduced in the abstract as the central contribution

pith-pipeline@v0.9.0 · 5699 in / 1115 out tokens · 25473 ms · 2026-05-23T16:55:31.907874+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 1 internal anchor

[1]

Maddix, Hao Wang, Michael W

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language ...

work page 2024
[2]

George Athanasopoulos, Rob Hyndman, Haiyan Song, and Doris C. Wu. The tourism forecasting competition. International Journal of Forecasting, 27(3):822–844, 2011

work page 2011
[3]

Meme suite: tools for motif discovery and searching.Nucleic acids research, 37(suppl_2):W202–W208, 2009

Timothy L Bailey, Mikael Boden, Fabian A Buske, Martin Frith, Charles E Grant, Luca Clementi, Jingyuan Ren, Wilfred W Li, and William S Noble. Meme suite: tools for motif discovery and searching.Nucleic acids research, 37(suppl_2):W202–W208, 2009

work page 2009
[4]

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego De Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, ...

work page 2022
[5]

Arik, and Tomas Pfister

Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, and Tomas Pfister. Tsmixer: An all-mlp architecture for time series forecasting, 2023

work page 2023
[6]

Forecastpfn: Synthetically-trained zero-shot forecasting

Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha V Naidu, and Colin White. Forecastpfn: Synthetically-trained zero-shot forecasting. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 2403–2426. Curran Associates, Inc., 2023

work page 2023
[7]

Adarnn: Adaptive learning and forecasting of time series, 2021

Yuntao Du, Jindong Wang, Wenjie Feng, Sinno Pan, Tao Qin, Renjun Xu, and Chongjun Wang. Adarnn: Adaptive learning and forecasting of time series, 2021

work page 2021
[8]

Augmenting transformers with knn-based composite memory for dialog.Transactions of the Association for Computational Linguistics, 9:82–99, 2021

Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. Augmenting transformers with knn-based composite memory for dialog.Transactions of the Association for Computational Linguistics, 9:82–99, 2021

work page 2021
[9]

Timegpt-1, 2024

Azul Garza, Cristian Challu, and Max Mergenthaler-Canseco. Timegpt-1, 2024

work page 2024
[10]

Webb, Rob J

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive, 2021

work page 2021
[11]

Mamba: Linear-time sequence modeling with selective state spaces, 2024

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2024

work page 2024
[12]

Realm: Retrieval-augmented language model pre-training, 2020

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. Realm: Retrieval-augmented language model pre-training, 2020

work page 2020
[13]

Hyndman and Anne B

Rob J. Hyndman and Anne B. Koehler. Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4):679–688, 2006

work page 2006
[14]

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering.arXiv preprint arXiv:2007.01282, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2007
[15]

Domain adaptation for time series forecasting via attention sharing

Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time series forecasting via attention sharing. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learn...

work page 2022
[16]

Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models, 2020

work page 2020
[17]

Baleen: Robust multi-hop reasoning at scale via condensed retrieval, 2022

Omar Khattab, Christopher Potts, and Matei Zaharia. Baleen: Robust multi-hop reasoning at scale via condensed retrieval, 2022

work page 2022
[18]

Colbert: Efficient and effective passage search via con- textualized late interaction over bert

Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via con- textualized late interaction over bert. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020. 12

work page 2020
[19]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022

work page 2022
[20]

Alphacode 2 technical report

Lemi Leblond et al. Alphacode 2 technical report. Technical report, DeepMind, 2023

work page 2023
[21]

Latent retrieval for weakly supervised open domain question answering, 2019

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly supervised open domain question answering, 2019

work page 2019
[22]

Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021

work page 2021
[23]

A survey on retrieval-augmented text generation

Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110, 2022

work page arXiv 2022
[24]

Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran As...

work page 2019
[25]

Foundation models for time series analysis: A tutorial and survey

Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. Foundation models for time series analysis: A tutorial and survey. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, volume 619 ofKDD ’24, page 6555–6565. ACM, August 2024

work page 2024
[26]

Arik, Nicolas Loeff, and Tomas Pfister

Bryan Lim, Sercan O. Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International Journal of Forecasting, 37(4):1748–1764, 2021

work page 2021
[27]

Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting

Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International Conference on Learning Representations, 2022

work page 2022
[28]

itransformer: Inverted transformers are effective for time series forecasting, 2024

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting, 2024

work page 2024
[29]

Query rewrit- ing for retrieval-augmented large language models,

Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. Query rewriting for retrieval-augmented large language models.arXiv preprint arXiv:2305.14283, 2023

work page arXiv 2023
[30]

Accuracy of forecasting: An empirical investigation

Spyros Makridakis, Michèle Hibon, and Claus Moser. Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society. Series A (General), 142(2):97–145, 1979

work page 1979
[31]

Dynamic time warping.Information retrieval for music and motion, pages 69–84, 2007

Meinard Müller. Dynamic time warping.Information retrieval for music and motion, pages 69–84, 2007

work page 2007
[32]

Müller, N

Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference.arXiv preprint arXiv:2112.10510, 2021

work page arXiv 2021
[33]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023. 13

work page 2023
[34]

A time series is worth 64 words: Long-term forecasting with transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[35]

Can generalist foundation models outcompete special-purpose tuning? case study in medicine, 2023

Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, and Eric Horvitz. Can generalist foundation models outcompete special-purpose tuning? case study in medicine, 2023

work page 2023
[36]

In-context learning and induction heads, 2022

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

work page 2022
[37]

Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. Meta-learning framework with applications to zero-shot time-series forecasting, 2020

work page 2020
[38]

Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks, 2020

Bernardo Pérez Orozco and Stephen J Roberts. Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks, 2020

work page 2020
[39]

Anjos, Sebastian Lautz, and Aleksandar Kolev

Egon Persak, Miguel F. Anjos, Sebastian Lautz, and Aleksandar Kolev. Multiple-resolution tokenization for time series forecasting with an application to pricing, 2024

work page 2024
[40]

Language models are unsupervised multitask learners, 2019

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019

work page 2019
[41]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023

work page 2023
[42]

Lag-llama: Towards foundation models for probabilistic time series forecasting, 2024

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models ...

work page 2024
[43]

Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting, 2021

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting, 2021

work page 2021
[44]

Agentic retrieval- augmented generation for time series analysis.arXiv preprint arXiv:2408.14484, 2024

Chidaksh Ravuru, Sagar Srinivas Sakhinana, and Venkataramana Runkana. Agentic retrieval- augmented generation for time series analysis.arXiv preprint arXiv:2408.14484, 2024

work page arXiv 2024
[45]

Deepar: Probabilis- tic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3):1181–1191, 2020

David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilis- tic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3):1181–1191, 2020

work page 2020
[46]

Retrieval-augmented mining of temporal logic specifications from data

Gaia Saveri and Luca Bortolussi. Retrieval-augmented mining of temporal logic specifications from data. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 315–331. Springer, 2024. 14

work page 2024
[47]

Roformer: Enhanced transformer with rotary position embedding, 2023

Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2023

work page 2023
[48]

Timepfn: Effective multivariate time series forecasting with synthetic data

Ege Onur Taga, Muhammed Emrullah Ildiz, and Samet Oymak. Timepfn: Effective multivariate time series forecasting with synthetic data. InNeurIPS Workshop on Time Series in the Age of Large Models, 2024

work page 2024
[49]

Totem: Tokenized time series embeddings for general time series analysis, 2024

Sabera Talukder, Yisong Yue, and Georgia Gkioxari. Totem: Tokenized time series embeddings for general time series analysis, 2024

work page 2024
[50]

Instance normalization: The missing ingredient for fast stylization, 2017

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization, 2017

work page 2017
[51]

Ratsf: Empowering customer service volume management through retrieval-augmented time-series forecasting.arXiv preprint arXiv:2403.04180, 2024

Tianfeng Wang and Gaojie Cui. Ratsf: Empowering customer service volume management through retrieval-augmented time-series forecasting.arXiv preprint arXiv:2403.04180, 2024

work page arXiv 2024
[52]

Deep factors for forecasting

Yuyang Wang, Alex Smola, Danielle Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski. Deep factors for forecasting. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 6607–6617. PMLR, 09–15 Jun 2019

work page 2019
[53]

Unified training of universal time series forecasting transformers, 2024

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers, 2024

work page 2024
[54]

Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. InAdvances in Neural Information Processing Systems, 2021

work page 2021
[55]

Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets

Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM), pages 1317–1322. Ieee, 2016

work page 2016
[56]

Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. InInternational Conference on Learning Representations, 2023

work page 2023
[57]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, volume 35, pages 11106–11115. AAAI Press, 2021

work page 2021
[58]

FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. InProc. 39th International Conference on Machine Learning (ICML 2022), 2022. 15 A Theoretical Results A.1 TS-R Problem In Section 2, we asserted that a two layer transformer architecture can solve th...

work page 2022
[59]

16 Proof 1 Realize that withW1, the positional encodings will be ignored and only token embeddings will remain

Set W2 = c.Φ⊥RΦ⊥ As c → ∞, we haveg(f2(f1(xL−C+1))) → Υ. 16 Proof 1 Realize that withW1, the positional encodings will be ignored and only token embeddings will remain. Moreover, N XtrΦ xL−C+1 ∥xL−C+1∥ℓ2 will return a vector v of size L − C with the largest element at j, for index j, corresponding to the matching retrieval motif. Asc → ∞, the softmax will...

work page 2016
[60]

NN5 dataset ([10]) consists of 111 daily time series of cash withdrawals from Automated Teller Machines (ATMs) in the UK, and was utilized in the NN5 forecasting competition

The data was sourced from the Johns Hopkins repository. NN5 dataset ([10]) consists of 111 daily time series of cash withdrawals from Automated Teller Machines (ATMs) in the UK, and was utilized in the NN5 forecasting competition. E.2 Benchmark II Datasets Tourism dataset ([10, 2]), derived from a Kaggle competition, includes 366 monthly and 427 quarterly...

work page 2015

[1] [1]

Maddix, Hao Wang, Michael W

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language ...

work page 2024

[2] [2]

George Athanasopoulos, Rob Hyndman, Haiyan Song, and Doris C. Wu. The tourism forecasting competition. International Journal of Forecasting, 27(3):822–844, 2011

work page 2011

[3] [3]

Meme suite: tools for motif discovery and searching.Nucleic acids research, 37(suppl_2):W202–W208, 2009

Timothy L Bailey, Mikael Boden, Fabian A Buske, Martin Frith, Charles E Grant, Luca Clementi, Jingyuan Ren, Wilfred W Li, and William S Noble. Meme suite: tools for motif discovery and searching.Nucleic acids research, 37(suppl_2):W202–W208, 2009

work page 2009

[4] [4]

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego De Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, ...

work page 2022

[5] [5]

Arik, and Tomas Pfister

Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, and Tomas Pfister. Tsmixer: An all-mlp architecture for time series forecasting, 2023

work page 2023

[6] [6]

Forecastpfn: Synthetically-trained zero-shot forecasting

Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha V Naidu, and Colin White. Forecastpfn: Synthetically-trained zero-shot forecasting. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 2403–2426. Curran Associates, Inc., 2023

work page 2023

[7] [7]

Adarnn: Adaptive learning and forecasting of time series, 2021

Yuntao Du, Jindong Wang, Wenjie Feng, Sinno Pan, Tao Qin, Renjun Xu, and Chongjun Wang. Adarnn: Adaptive learning and forecasting of time series, 2021

work page 2021

[8] [8]

Augmenting transformers with knn-based composite memory for dialog.Transactions of the Association for Computational Linguistics, 9:82–99, 2021

Angela Fan, Claire Gardent, Chloé Braud, and Antoine Bordes. Augmenting transformers with knn-based composite memory for dialog.Transactions of the Association for Computational Linguistics, 9:82–99, 2021

work page 2021

[9] [9]

Timegpt-1, 2024

Azul Garza, Cristian Challu, and Max Mergenthaler-Canseco. Timegpt-1, 2024

work page 2024

[10] [10]

Webb, Rob J

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive, 2021

work page 2021

[11] [11]

Mamba: Linear-time sequence modeling with selective state spaces, 2024

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2024

work page 2024

[12] [12]

Realm: Retrieval-augmented language model pre-training, 2020

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. Realm: Retrieval-augmented language model pre-training, 2020

work page 2020

[13] [13]

Hyndman and Anne B

Rob J. Hyndman and Anne B. Koehler. Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4):679–688, 2006

work page 2006

[14] [14]

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering.arXiv preprint arXiv:2007.01282, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2007

[15] [15]

Domain adaptation for time series forecasting via attention sharing

Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time series forecasting via attention sharing. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learn...

work page 2022

[16] [16]

Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models, 2020

work page 2020

[17] [17]

Baleen: Robust multi-hop reasoning at scale via condensed retrieval, 2022

Omar Khattab, Christopher Potts, and Matei Zaharia. Baleen: Robust multi-hop reasoning at scale via condensed retrieval, 2022

work page 2022

[18] [18]

Colbert: Efficient and effective passage search via con- textualized late interaction over bert

Omar Khattab and Matei Zaharia. Colbert: Efficient and effective passage search via con- textualized late interaction over bert. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020. 12

work page 2020

[19] [19]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022

work page 2022

[20] [20]

Alphacode 2 technical report

Lemi Leblond et al. Alphacode 2 technical report. Technical report, DeepMind, 2023

work page 2023

[21] [21]

Latent retrieval for weakly supervised open domain question answering, 2019

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly supervised open domain question answering, 2019

work page 2019

[22] [22]

Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021

work page 2021

[23] [23]

A survey on retrieval-augmented text generation

Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110, 2022

work page arXiv 2022

[24] [24]

Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran As...

work page 2019

[25] [25]

Foundation models for time series analysis: A tutorial and survey

Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. Foundation models for time series analysis: A tutorial and survey. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, volume 619 ofKDD ’24, page 6555–6565. ACM, August 2024

work page 2024

[26] [26]

Arik, Nicolas Loeff, and Tomas Pfister

Bryan Lim, Sercan O. Arik, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International Journal of Forecasting, 37(4):1748–1764, 2021

work page 2021

[27] [27]

Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting

Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International Conference on Learning Representations, 2022

work page 2022

[28] [28]

itransformer: Inverted transformers are effective for time series forecasting, 2024

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting, 2024

work page 2024

[29] [29]

Query rewrit- ing for retrieval-augmented large language models,

Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. Query rewriting for retrieval-augmented large language models.arXiv preprint arXiv:2305.14283, 2023

work page arXiv 2023

[30] [30]

Accuracy of forecasting: An empirical investigation

Spyros Makridakis, Michèle Hibon, and Claus Moser. Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society. Series A (General), 142(2):97–145, 1979

work page 1979

[31] [31]

Dynamic time warping.Information retrieval for music and motion, pages 69–84, 2007

Meinard Müller. Dynamic time warping.Information retrieval for music and motion, pages 69–84, 2007

work page 2007

[32] [32]

Müller, N

Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference.arXiv preprint arXiv:2112.10510, 2021

work page arXiv 2021

[33] [33]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023. 13

work page 2023

[34] [34]

A time series is worth 64 words: Long-term forecasting with transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[35] [35]

Can generalist foundation models outcompete special-purpose tuning? case study in medicine, 2023

Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, and Eric Horvitz. Can generalist foundation models outcompete special-purpose tuning? case study in medicine, 2023

work page 2023

[36] [36]

In-context learning and induction heads, 2022

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

work page 2022

[37] [37]

Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. Meta-learning framework with applications to zero-shot time-series forecasting, 2020

work page 2020

[38] [38]

Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks, 2020

Bernardo Pérez Orozco and Stephen J Roberts. Zero-shot and few-shot time series forecasting with ordinal regression recurrent neural networks, 2020

work page 2020

[39] [39]

Anjos, Sebastian Lautz, and Aleksandar Kolev

Egon Persak, Miguel F. Anjos, Sebastian Lautz, and Aleksandar Kolev. Multiple-resolution tokenization for time series forecasting with an application to pricing, 2024

work page 2024

[40] [40]

Language models are unsupervised multitask learners, 2019

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners, 2019

work page 2019

[41] [41]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023

work page 2023

[42] [42]

Lag-llama: Towards foundation models for probabilistic time series forecasting, 2024

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models ...

work page 2024

[43] [43]

Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting, 2021

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting, 2021

work page 2021

[44] [44]

Agentic retrieval- augmented generation for time series analysis.arXiv preprint arXiv:2408.14484, 2024

Chidaksh Ravuru, Sagar Srinivas Sakhinana, and Venkataramana Runkana. Agentic retrieval- augmented generation for time series analysis.arXiv preprint arXiv:2408.14484, 2024

work page arXiv 2024

[45] [45]

Deepar: Probabilis- tic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3):1181–1191, 2020

David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilis- tic forecasting with autoregressive recurrent networks.International Journal of Forecasting, 36(3):1181–1191, 2020

work page 2020

[46] [46]

Retrieval-augmented mining of temporal logic specifications from data

Gaia Saveri and Luca Bortolussi. Retrieval-augmented mining of temporal logic specifications from data. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 315–331. Springer, 2024. 14

work page 2024

[47] [47]

Roformer: Enhanced transformer with rotary position embedding, 2023

Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2023

work page 2023

[48] [48]

Timepfn: Effective multivariate time series forecasting with synthetic data

Ege Onur Taga, Muhammed Emrullah Ildiz, and Samet Oymak. Timepfn: Effective multivariate time series forecasting with synthetic data. InNeurIPS Workshop on Time Series in the Age of Large Models, 2024

work page 2024

[49] [49]

Totem: Tokenized time series embeddings for general time series analysis, 2024

Sabera Talukder, Yisong Yue, and Georgia Gkioxari. Totem: Tokenized time series embeddings for general time series analysis, 2024

work page 2024

[50] [50]

Instance normalization: The missing ingredient for fast stylization, 2017

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization, 2017

work page 2017

[51] [51]

Ratsf: Empowering customer service volume management through retrieval-augmented time-series forecasting.arXiv preprint arXiv:2403.04180, 2024

Tianfeng Wang and Gaojie Cui. Ratsf: Empowering customer service volume management through retrieval-augmented time-series forecasting.arXiv preprint arXiv:2403.04180, 2024

work page arXiv 2024

[52] [52]

Deep factors for forecasting

Yuyang Wang, Alex Smola, Danielle Maddix, Jan Gasthaus, Dean Foster, and Tim Januschowski. Deep factors for forecasting. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 6607–6617. PMLR, 09–15 Jun 2019

work page 2019

[53] [53]

Unified training of universal time series forecasting transformers, 2024

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers, 2024

work page 2024

[54] [54]

Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. InAdvances in Neural Information Processing Systems, 2021

work page 2021

[55] [55]

Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets

Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM), pages 1317–1322. Ieee, 2016

work page 2016

[56] [56]

Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. InInternational Conference on Learning Representations, 2023

work page 2023

[57] [57]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, volume 35, pages 11106–11115. AAAI Press, 2021

work page 2021

[58] [58]

FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. InProc. 39th International Conference on Machine Learning (ICML 2022), 2022. 15 A Theoretical Results A.1 TS-R Problem In Section 2, we asserted that a two layer transformer architecture can solve th...

work page 2022

[59] [59]

16 Proof 1 Realize that withW1, the positional encodings will be ignored and only token embeddings will remain

Set W2 = c.Φ⊥RΦ⊥ As c → ∞, we haveg(f2(f1(xL−C+1))) → Υ. 16 Proof 1 Realize that withW1, the positional encodings will be ignored and only token embeddings will remain. Moreover, N XtrΦ xL−C+1 ∥xL−C+1∥ℓ2 will return a vector v of size L − C with the largest element at j, for index j, corresponding to the matching retrieval motif. Asc → ∞, the softmax will...

work page 2016

[60] [60]

NN5 dataset ([10]) consists of 111 daily time series of cash withdrawals from Automated Teller Machines (ATMs) in the UK, and was utilized in the NN5 forecasting competition

The data was sourced from the Johns Hopkins repository. NN5 dataset ([10]) consists of 111 daily time series of cash withdrawals from Automated Teller Machines (ATMs) in the UK, and was utilized in the NN5 forecasting competition. E.2 Benchmark II Datasets Tourism dataset ([10, 2]), derived from a Kaggle competition, includes 366 monthly and 427 quarterly...

work page 2015