PaP-NF: Probabilistic Long-Term Time Series Forecasting via Prefix-as-Prompt Reprogramming and Normalizing Flows

Minju Kim; Youngbum Hur

arxiv: 2605.23219 · v1 · pith:2HGUNM66new · submitted 2026-05-22 · 💻 cs.LG · cs.AI

PaP-NF: Probabilistic Long-Term Time Series Forecasting via Prefix-as-Prompt Reprogramming and Normalizing Flows

Minju Kim , Youngbum Hur This is my paper

Pith reviewed 2026-05-25 05:16 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords probabilistic forecastingtime seriesnormalizing flowslarge language modelsprefix-as-promptuncertainty quantificationlong-term forecastingCRPS

0 comments

The pith

Prefix-as-prompt alignment with a frozen LLM conditions a normalizing flow decoder to produce multi-modal probabilistic forecasts for long time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes PaP-NF as a way to move beyond single-point time series predictions by generating full probability distributions that reflect uncertainty. It does this by converting time series prefixes into prompts for a frozen large language model, pulling out global context from the model, and feeding that context into a normalizing flow decoder to shape the forecast distributions. The resulting method is tested on long-term forecasting benchmarks using the Continuous Ranked Probability Score and is shown to handle multiple possible futures while keeping ordinary point predictions competitive. A reader would care because many practical decisions depend on knowing the spread and shape of possible outcomes rather than one average value.

Core claim

PaP-NF aligns continuous time series data with a frozen LLM through a Prefix-as-Prompt mechanism, extracts global context from the LLM, and uses that context to condition a normalizing flow decoder, thereby generating predictive distributions that capture multi-modal uncertainty on long-term forecasting benchmarks while preserving competitive point-forecast accuracy.

What carries the argument

The Prefix-as-Prompt mechanism, which turns time series prefixes into LLM prompts to extract conditioning context for the normalizing flow decoder.

If this is right

Forecasts will represent ranges of possible futures rather than single values, which matters when environments are uncertain.
Point-forecast accuracy stays at the level of existing deterministic or probabilistic methods on the same benchmarks.
The LLM stays frozen, so its pre-trained knowledge is reused without task-specific retraining of the language model.
The approach is evaluated directly with CRPS, the metric that scores the full predictive distribution rather than just the mean.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prefix reprogramming step could be tried with other sequence generators when the goal is to import broad context into a specialized decoder.
Because the LLM is not updated, the method may transfer to new time series domains with less data than approaches that retrain the entire model.
If the extracted context proves reliable, similar conditioning could be tested on tasks that require both global knowledge and local uncertainty modeling.

Load-bearing premise

The global context extracted by the frozen LLM from the prefix-as-prompt alignment provides effective conditioning information for the normalizing flow decoder on time series data.

What would settle it

On the standard long-term forecasting benchmarks, PaP-NF produces CRPS scores no better than strong baselines or its output distributions assign low probability to the actual multi-modal patterns seen in held-out test data.

Figures

Figures reproduced from arXiv: 2605.23219 by Minju Kim, Youngbum Hur.

**Figure 1.** Figure 1: Overview of the PaP-NF framework. Given input time series X, a linear encoder extracts localized temporal patterns as z. Learnable prefixes P align z with a frozen LLM, which produces a global context vector c via average pooling. The fused representation h = Fuse(z, c) conditions a normalizing flow to generate the forecast distribution. 3.1 Problem Formulation Given an input time series X = {x1, . . . , … view at source ↗

**Figure 2.** Figure 2: Detailed illustration of the temporal encoding and prompt-based reprogramming process. The input time series X is partitioned into segments and flattened into a numerical vector z via a linear layer, projected into the LLM token embedding space to obtain Erep, and concatenated with learnable prefix vectors P to form the aligned input XLLM for the frozen LLM. To align z with the LLM input space, we apply a… view at source ↗

**Figure 3.** Figure 3: Illustration of the normalizing flow framework. A simple base distribution u0 ∼ N (0, I) is transformed into a complex target distribution uTflow through a sequence of invertible mappings f1, . . . , fT , enabling flexible distribution modeling and efficient sampling of diverse future trajectories. 4 Experiments We evaluate the PaP-NF framework through: (i) point forecasting comparisons on long-term benchm… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on ETTm2 (H = 720). PaP-NF generates prediction intervals versus deterministic point forecasts. Stars mark time steps where the deterministic baseline exhibits the highest absolute errors (top 10%). PaP-NF captures these challenging points within its 90% prediction intervals, illustrating its ability to model uncertainty and manage high-risk regions. 6 Conclusion We presented PaP-NF… view at source ↗

read the original abstract

Time series forecasting plays a central role in many real-world applications and has been extensively studied. Most existing approaches rely on deterministic models. However, real-world environments exhibit inherently uncertain and complex future behaviors, making single-point predictions insufficient. This highlights the need for probabilistic forecasting methods that can quantify and represent uncertainty. In this work, we propose PaP-NF, a probabilistic forecasting framework that aligns continuous time series representations with a frozen large language model (LLM) using a Prefix-as-Prompt mechanism, and conditions a normalizing flow decoder on the global context extracted by the LLM. The quality of the resulting predictive distributions is evaluated using the Continuous Ranked Probability Score (CRPS), a standard metric in probabilistic forecasting. Across a variety of long-term forecasting benchmarks, PaP-NF robustly captures multi-modal uncertainty while maintaining competitive point forecasting accuracy. The official implementation is available at: https://github.com/democracy04/PaP-NF

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PaP-NF combines prefix-as-prompt reprogramming of a frozen LLM with a conditioned normalizing flow for probabilistic time series forecasting, but the abstract gives no numbers so the actual gains are unclear.

read the letter

PaP-NF is a new framework that takes time series data, aligns it to a frozen LLM via prefix-as-prompt, pulls out global context, and uses that to condition a normalizing flow for generating probabilistic forecasts. What stands out is the specific way they integrate the reprogramming step with the flow model. The idea of using a frozen LLM to provide conditioning without fine-tuning it is reasonable, and having the code on GitHub is a plus for anyone who wants to try it out. The paper evaluates with CRPS on long-term forecasting benchmarks and claims it captures multi-modal uncertainty while keeping point forecasts competitive. That sounds like a solid goal for probabilistic methods. The main issue is that the abstract doesn't include any actual numbers, baselines, or experimental setup details. Without those, it's hard to know if the LLM context is really adding value or if the flows are doing the heavy lifting. The assumption that the prefix alignment gives useful global information for the decoder could be the weak point, and the full paper would need to show ablation studies or comparisons to confirm it. The stress-test note is right that no internal inconsistency shows up from the description alone. This kind of work is aimed at people in the time series community who are exploring LLM integrations for uncertainty quantification. It might be worth a read for someone already working on flow-based or reprogramming methods, but the impact looks incremental based on what's described. I'd say send it for peer review if the experiments in the full manuscript are thorough and the results hold up against standard baselines. The construction itself is coherent and doesn't have obvious circularity or unsupported claims from the abstract.

Referee Report

1 major / 0 minor

Summary. The paper proposes PaP-NF, a probabilistic long-term time series forecasting framework. It aligns continuous time series representations with a frozen LLM via a Prefix-as-Prompt reprogramming mechanism, extracts global context from the LLM, and conditions a normalizing flow decoder on this context to produce predictive distributions. Performance is assessed via CRPS on long-term forecasting benchmarks, with claims that the method robustly captures multi-modal uncertainty while maintaining competitive point-forecast accuracy. The implementation is released at a public GitHub repository.

Significance. If the central performance claims are supported by properly controlled experiments, the work would be of moderate significance: it offers a concrete construction for leveraging frozen LLMs as context extractors inside a probabilistic decoder without LLM fine-tuning, which could be useful for multi-modal uncertainty modeling in time series. The open-source code is a clear positive for reproducibility.

major comments (1)

[Abstract] Abstract: the central claim that PaP-NF 'robustly captures multi-modal uncertainty while maintaining competitive point forecasting accuracy' across benchmarks is stated without any quantitative results, baseline comparisons, dataset details, or experimental protocol. This absence makes it impossible to evaluate whether the reported performance supports the claim or whether the weakest assumption (effective conditioning from LLM-extracted global context) holds.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that PaP-NF 'robustly captures multi-modal uncertainty while maintaining competitive point forecasting accuracy' across benchmarks is stated without any quantitative results, baseline comparisons, dataset details, or experimental protocol. This absence makes it impossible to evaluate whether the reported performance supports the claim or whether the weakest assumption (effective conditioning from LLM-extracted global context) holds.

Authors: We agree that the abstract would benefit from greater specificity to allow readers to immediately assess the claims. In the revised version we will expand the abstract to include (i) the main long-term forecasting benchmarks used, (ii) average CRPS improvements relative to the strongest baselines, and (iii) a concise statement of the experimental protocol. These additions will be drawn directly from the quantitative results already reported in Sections 4 and 5. The manuscript body already contains the full dataset descriptions, baseline comparisons, and ablation studies that support the effectiveness of the LLM-derived global context; the abstract revision will simply surface the key numbers for quick evaluation. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided abstract and description present PaP-NF as a novel construction that aligns time series prefixes with a frozen LLM via a Prefix-as-Prompt mechanism and conditions a normalizing flow decoder on the extracted context, with performance evaluated externally via CRPS on benchmarks. No equations, fitted parameters renamed as predictions, self-citations, or uniqueness theorems are present that would reduce the claimed results to inputs by construction. The derivation chain is a high-level architectural proposal whose validity rests on external empirical evaluation rather than internal self-definition or self-referential justification.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep learning assumptions about the utility of frozen pre-trained models and the expressivity of normalizing flows; no new entities are postulated and no free parameters are explicitly fitted in the abstract description.

axioms (2)

domain assumption A frozen LLM can extract useful global context from reprogrammed time series prefixes
Invoked as the basis for conditioning the decoder
domain assumption Normalizing flows can represent multi-modal predictive distributions when properly conditioned
Core modeling choice for the decoder

pith-pipeline@v0.9.0 · 5696 in / 1412 out tokens · 61646 ms · 2026-05-25T05:16:17.917362+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

[1]

GluonTS: Probabilis- tic and Neural Time Series Modeling in Python,

A. Alexandrov, K. Benidis, M. Bohlke-Schneider, et al., “GluonTS: Probabilis- tic and Neural Time Series Modeling in Python,”Journal of Machine Learning Research (JMLR), vol. 21, no. 116, pp. 1–6, 2020

work page 2020
[2]

Box and G

G. Box and G. Jenkins,Time Series Analysis: Forecasting and Control, Holden- Day, 1976

work page 1976
[3]

Forecasting and stock control for intermittent demands,

J. D. Croston, “Forecasting and stock control for intermittent demands,”Opera- tional Research Quarterly, vol. 23, no. 3, pp. 289–303, 1972

work page 1972
[4]

Density estimation using Real NVP,

L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using Real NVP,” inInternational Conference on Learning Representations (ICLR), 2017

work page 2017
[5]

Forecasting: principles and practice,

R. J. Hyndman and G. Athanasopoulos, “Forecasting: principles and practice,” OTexts, 2018

work page 2018
[6]

Automatic time series forecasting: The fore- cast package for R,

R. J. Hyndman and Y. Khandakar, “Automatic time series forecasting: The fore- cast package for R,”Journal of Statistical Software, vol. 27, no. 3, pp. 1–22, 2008

work page 2008
[7]

Time-LLM: Time Series Forecasting by Repro- gramming Large Language Models,

M. Jin, S. Wang, L. Ma, et al., “Time-LLM: Time Series Forecasting by Repro- gramming Large Language Models,” inInternational Conference on Learning Rep- resentations (ICLR), 2024

work page 2024
[8]

A Comprehensive Survey of Deep Learning for Time Series Forecasting: Architectural Diversity and Open Chal- lenges,

J. Kim, H. Kim, H. Kim, D. Lee, and S. Yoon, “A Comprehensive Survey of Deep Learning for Time Series Forecasting: Architectural Diversity and Open Chal- lenges,”Artificial Intelligence Review, vol. 58, no. 7, pp. 1–95, 2025

work page 2025
[9]

The Power of Scale for Parameter- Efficient Prompt Tuning,

B. Lester, R. Al-Rfou, and N. Constant, “The Power of Scale for Parameter- Efficient Prompt Tuning,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

work page 2021
[10]

Prefix-Tuning: Optimizing Continuous Prompts for Genera- tion,

X. Li and P. Liang, “Prefix-Tuning: Optimizing Continuous Prompts for Genera- tion,” inProceedings of the 59th Annual Meeting of the Association for Computa- tional Linguistics (ACL), 2021

work page 2021
[11]

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting,

Y. Liu, H. Wu, J. Wang, and M. Long, “Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[12]

Meta-Llama-3.1-8B,

Meta AI, “Meta-Llama-3.1-8B,”Hugging Face model card,https://huggingfac e.co/meta-llama/Meta-Llama-3.1-8B, 2024

work page 2024
[13]

Normalizing Flows for Probabilistic Modeling and Inference,

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshmi- narayanan, “Normalizing Flows for Probabilistic Modeling and Inference,”Journal of Machine Learning Research, vol. 22, no. 57, pp. 1–64, 2021. PaP-NF: Probabilistic Long-Term Time Series Forecasting 15

work page 2021
[14]

Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows,

K. Rasul, A. Sheikh, I. Schuster, U. Bergmann, and R. Vollgraf, “Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows,” inIn- ternational Conference on Learning Representations (ICLR), 2021

work page 2021
[15]

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting,

K. Rasul, A. Ashok, A. R. Williams, et al., “Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting,” arXiv:2310.08278, 2023

work page arXiv 2023
[16]

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks,

D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski, “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks,”International Journal of Forecasting, vol. 36, no. 3, pp. 1181–1191, 2020

work page 2020
[17]

Autogluon-timeseries: AutoML for probabilistic time series forecasting,

O. Shchur, A. C. Turkmen, N. Erickson, et al., “Autogluon-timeseries: AutoML for probabilistic time series forecasting,” inAutoML Conference 2023 (ABCD Track), 2023

work page 2023
[18]

The accuracy of intermittent demand estimates,

A. A. Syntetos and J. E. Boylan, “The accuracy of intermittent demand estimates,” International Journal of Forecasting, vol. 21, no. 2, pp. 303–314, 2005

work page 2005
[19]

Are Language Models Actually Useful for Time Series Forecasting?

M. Tan, M. Merrill, V. Gupta, et al., “Are Language Models Actually Useful for Time Series Forecasting?” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[20]

CSDI: Conditional Score-Based Diffusion Models for Probabilistic Time Series Imputation,

Y. Tashiro, J. Song, Y. Song, and S. Ermon, “CSDI: Conditional Score-Based Diffusion Models for Probabilistic Time Series Imputation,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[21]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, et al., “LLaMA: Open and Efficient Foundation Language Models,”arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

ETSformer: Exponential Smoothing Transformers for Time-series Forecasting,

G. Woo, C. Liu, D. Sahoo, A. Kumar, and S. Hoi, “ETSformer: Exponential Smoothing Transformers for Time-series Forecasting,”arXivpreprintarXiv: 2202.01381, 2022

work page arXiv 2022
[23]

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis,

H. Wu, T. Hu, Y. Liu, et al., “TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis,” inInternational Conference on Learning Repre- sentations (ICLR), 2023

work page 2023
[24]

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[25]

Are Transformers Effective for Time Series Forecasting?

A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are Transformers Effective for Time Series Forecasting?” inAAAI Conference on Artificial Intelligence (AAAI), 2023

work page 2023
[26]

Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures,

T. Zhang, Y. Zhang, W. Cao, et al., “Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures,”arXivpreprintarX iv:2207.01186, 2022

work page arXiv 2022
[27]

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting,

H. Zhou, S. Zhang, J. Peng, et al., “Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 11106–11115, 2021

work page 2021
[28]

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting,

T. Zhou, Z. Ma, Q. Wen, et al., “FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting,” inInternational Conference on Machine Learning (ICML), 2022

work page 2022
[29]

One Fits All: Power General Time Series Analysis by Pretrained LM,

T. Zhou, P. Niu, L. Sun, and R. Jin, “One Fits All: Power General Time Series Analysis by Pretrained LM,” inAdvances in Neural Information Processing Sys- tems (NeurIPS), vol. 36, pp. 43322–43355, 2023

work page 2023

[1] [1]

GluonTS: Probabilis- tic and Neural Time Series Modeling in Python,

A. Alexandrov, K. Benidis, M. Bohlke-Schneider, et al., “GluonTS: Probabilis- tic and Neural Time Series Modeling in Python,”Journal of Machine Learning Research (JMLR), vol. 21, no. 116, pp. 1–6, 2020

work page 2020

[2] [2]

Box and G

G. Box and G. Jenkins,Time Series Analysis: Forecasting and Control, Holden- Day, 1976

work page 1976

[3] [3]

Forecasting and stock control for intermittent demands,

J. D. Croston, “Forecasting and stock control for intermittent demands,”Opera- tional Research Quarterly, vol. 23, no. 3, pp. 289–303, 1972

work page 1972

[4] [4]

Density estimation using Real NVP,

L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using Real NVP,” inInternational Conference on Learning Representations (ICLR), 2017

work page 2017

[5] [5]

Forecasting: principles and practice,

R. J. Hyndman and G. Athanasopoulos, “Forecasting: principles and practice,” OTexts, 2018

work page 2018

[6] [6]

Automatic time series forecasting: The fore- cast package for R,

R. J. Hyndman and Y. Khandakar, “Automatic time series forecasting: The fore- cast package for R,”Journal of Statistical Software, vol. 27, no. 3, pp. 1–22, 2008

work page 2008

[7] [7]

Time-LLM: Time Series Forecasting by Repro- gramming Large Language Models,

M. Jin, S. Wang, L. Ma, et al., “Time-LLM: Time Series Forecasting by Repro- gramming Large Language Models,” inInternational Conference on Learning Rep- resentations (ICLR), 2024

work page 2024

[8] [8]

A Comprehensive Survey of Deep Learning for Time Series Forecasting: Architectural Diversity and Open Chal- lenges,

J. Kim, H. Kim, H. Kim, D. Lee, and S. Yoon, “A Comprehensive Survey of Deep Learning for Time Series Forecasting: Architectural Diversity and Open Chal- lenges,”Artificial Intelligence Review, vol. 58, no. 7, pp. 1–95, 2025

work page 2025

[9] [9]

The Power of Scale for Parameter- Efficient Prompt Tuning,

B. Lester, R. Al-Rfou, and N. Constant, “The Power of Scale for Parameter- Efficient Prompt Tuning,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

work page 2021

[10] [10]

Prefix-Tuning: Optimizing Continuous Prompts for Genera- tion,

X. Li and P. Liang, “Prefix-Tuning: Optimizing Continuous Prompts for Genera- tion,” inProceedings of the 59th Annual Meeting of the Association for Computa- tional Linguistics (ACL), 2021

work page 2021

[11] [11]

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting,

Y. Liu, H. Wu, J. Wang, and M. Long, “Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[12] [12]

Meta-Llama-3.1-8B,

Meta AI, “Meta-Llama-3.1-8B,”Hugging Face model card,https://huggingfac e.co/meta-llama/Meta-Llama-3.1-8B, 2024

work page 2024

[13] [13]

Normalizing Flows for Probabilistic Modeling and Inference,

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshmi- narayanan, “Normalizing Flows for Probabilistic Modeling and Inference,”Journal of Machine Learning Research, vol. 22, no. 57, pp. 1–64, 2021. PaP-NF: Probabilistic Long-Term Time Series Forecasting 15

work page 2021

[14] [14]

Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows,

K. Rasul, A. Sheikh, I. Schuster, U. Bergmann, and R. Vollgraf, “Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows,” inIn- ternational Conference on Learning Representations (ICLR), 2021

work page 2021

[15] [15]

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting,

K. Rasul, A. Ashok, A. R. Williams, et al., “Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting,” arXiv:2310.08278, 2023

work page arXiv 2023

[16] [16]

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks,

D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski, “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks,”International Journal of Forecasting, vol. 36, no. 3, pp. 1181–1191, 2020

work page 2020

[17] [17]

Autogluon-timeseries: AutoML for probabilistic time series forecasting,

O. Shchur, A. C. Turkmen, N. Erickson, et al., “Autogluon-timeseries: AutoML for probabilistic time series forecasting,” inAutoML Conference 2023 (ABCD Track), 2023

work page 2023

[18] [18]

The accuracy of intermittent demand estimates,

A. A. Syntetos and J. E. Boylan, “The accuracy of intermittent demand estimates,” International Journal of Forecasting, vol. 21, no. 2, pp. 303–314, 2005

work page 2005

[19] [19]

Are Language Models Actually Useful for Time Series Forecasting?

M. Tan, M. Merrill, V. Gupta, et al., “Are Language Models Actually Useful for Time Series Forecasting?” inAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[20] [20]

CSDI: Conditional Score-Based Diffusion Models for Probabilistic Time Series Imputation,

Y. Tashiro, J. Song, Y. Song, and S. Ermon, “CSDI: Conditional Score-Based Diffusion Models for Probabilistic Time Series Imputation,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021

[21] [21]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, et al., “LLaMA: Open and Efficient Foundation Language Models,”arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [22]

ETSformer: Exponential Smoothing Transformers for Time-series Forecasting,

G. Woo, C. Liu, D. Sahoo, A. Kumar, and S. Hoi, “ETSformer: Exponential Smoothing Transformers for Time-series Forecasting,”arXivpreprintarXiv: 2202.01381, 2022

work page arXiv 2022

[23] [23]

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis,

H. Wu, T. Hu, Y. Liu, et al., “TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis,” inInternational Conference on Learning Repre- sentations (ICLR), 2023

work page 2023

[24] [24]

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021

[25] [25]

Are Transformers Effective for Time Series Forecasting?

A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are Transformers Effective for Time Series Forecasting?” inAAAI Conference on Artificial Intelligence (AAAI), 2023

work page 2023

[26] [26]

Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures,

T. Zhang, Y. Zhang, W. Cao, et al., “Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures,”arXivpreprintarX iv:2207.01186, 2022

work page arXiv 2022

[27] [27]

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting,

H. Zhou, S. Zhang, J. Peng, et al., “Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 11106–11115, 2021

work page 2021

[28] [28]

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting,

T. Zhou, Z. Ma, Q. Wen, et al., “FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting,” inInternational Conference on Machine Learning (ICML), 2022

work page 2022

[29] [29]

One Fits All: Power General Time Series Analysis by Pretrained LM,

T. Zhou, P. Niu, L. Sun, and R. Jin, “One Fits All: Power General Time Series Analysis by Pretrained LM,” inAdvances in Neural Information Processing Sys- tems (NeurIPS), vol. 36, pp. 43322–43355, 2023

work page 2023