pith. sign in

arxiv: 2305.10721 · v2 · pith:HDQ4MNFVnew · submitted 2023-05-18 · 💻 cs.LG · cs.AI

Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping

Pith reviewed 2026-05-24 08:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords long-term time series forecastingaffine mappinglinear modelsperiodic signalsreversible normalizationinput horizontransition matrix
0
0 comments X

The pith

Affine mapping dominates long-term time series forecasting performance, with different models learning nearly identical transition matrices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines why complex architectures for long-term time series forecasting often match or only slightly beat a single linear layer. Experiments on simulated and real data show that an affine transformation from input window to forecast horizon accounts for the bulk of accuracy on standard benchmarks. Different models converge on similar transition matrices, which succeed on periodic signals but falter on non-periodic trends or channels with mismatched periods. Reversible normalization converts trends into periodic-like forms that the mapping can handle, while longer input horizons aid multi-channel cases with varying periods.

Core claim

Affine mapping dominates forecasting performance across commonly utilized benchmarks, with models learning similar transition matrices from input to output. Affine mapping effectively captures periodic patterns but struggles with non-periodic signals or time series with varying periods across channels. Reversible normalization significantly enhances trend forecasting by transforming non-periodic trends into periodic-like patterns. Increasing input horizon improves performance on multi-channel data with different periods.

What carries the argument

Affine mapping: the linear transformation plus bias that converts the input time series window directly into the predicted output window.

If this is right

  • Models with different architectures converge on similar transition matrices for the same data.
  • Affine mapping succeeds on periodic patterns but cannot handle non-periodic components without help.
  • Reversible normalization improves trend accuracy by reshaping non-periodic signals into periodic-like forms.
  • Longer input horizons raise accuracy when different channels have mismatched periods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future work could test whether forcing all models to share one transition matrix eliminates most performance gaps between architectures.
  • Benchmarks built around stronger non-periodic or chaotic components would expose where affine approaches break.
  • Explicit cross-channel period modeling may be needed once linear mappings have extracted the periodic part.

Load-bearing premise

The observed similarity of transition matrices and the effectiveness of affine mapping will hold for signals and datasets beyond the specific periodic benchmarks tested.

What would settle it

An experiment in which a non-linear model reaches high accuracy on a benchmark while its learned input-to-output matrix differs substantially from the matrix learned by a plain linear layer.

Figures

Figures reproduced from arXiv: 2305.10721 by Shiyi Qi, Yiduo Li, Zenglin Xu, Zhe Li.

Figure 1
Figure 1. Figure 1: The general framework for time series forecasting, comprising of RevIN [10], a temporal feature extractor, and a linear projection layer. General framework [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Forecasting results of selected models on ETTh1 [ [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Weights visualization on ETTh1 where the input and output length are set as 96. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Forecasting visualization of a linear model on simulated seasonal and trend signals. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Forecasting results on the trend signal with different normalization methods. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The effect of RevIN applied to seasonal and trend signals. Each segment separated by a [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Forecasting results of a linear layer with RevIN on simulated time series with seasonal and [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Forecasting results on simulated time series with three channels of different periods. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Simulated sine waves with angular frequency ranges from 1/30 to 1/3 and the length of 200. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Left: Forecasting resutls on simulated 2-variate time series. ∆ω denotes the difference in angular frequency between channels. Middle: Forecasting results of different models on simulated datasets with different periodic channels. Right: Impact of input horizon on forecasting performance. Theorem 3. Let X = [s1, s2, . . . , sc] > ∈ R c×n be the input historical multivariate time series with c channels and… view at source ↗
Figure 11
Figure 11. Figure 11: Impact of input horizon on forecasting results. Lower MSE indicates better performance. [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
read the original abstract

Introduction: Long-term time series forecasting (LTSF) has gained significant attention in recent years. While various specialized designs exist for capturing temporal dependency, recent studies have shown that even a single linear layer can achieve competitive performance. This paper investigates the intrinsic effectiveness of recent LTSF approaches and reveals the critical role of affine mapping. Materials and methods: We conduct comprehensive experiments on both simulated and real-world datasets to analyze the components of state-of-the-art models. A theoretical analysis is provided to explain the working mechanisms of affine mapping in periodic signal forecasting. We evaluate the impact of reversible normalization and input horizon extension on model robustness. Results: We find that (1) affine mapping dominates forecasting performance across commonly utilized benchmarks, with models learning similar transition matrices from input to output; (2) affine mapping effectively captures periodic patterns but struggles with non-periodic signals or time series with varying periods across channels; (3) reversible normalization significantly enhances trend forecasting by transforming non-periodic trends into periodic-like patterns; (4) increasing input horizon improves performance on multi-channel data with different periods. Code is available at: \url{https://github.com/plumprc/RTSF}. Conclusions: Our findings provide theoretical and experimental insights into the working mechanisms of LTSF models, highlighting both the strengths and limitations of linear approaches. The results suggest that future model development should focus on handling cross-channel period variations and non-periodic components.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that affine (linear) mappings dominate performance in long-term time series forecasting (LTSF). It supports this via experiments on simulated and real datasets showing that state-of-the-art models learn similar input-to-output transition matrices, a theoretical sketch explaining why affine mappings work for periodic signals, and ablations demonstrating that reversible normalization converts non-periodic trends into periodic-like patterns while longer input horizons help with cross-channel period variation. The work concludes that future LTSF research should prioritize handling non-periodic components and period heterogeneity across channels, and releases code at https://github.com/plumprc/RTSF.

Significance. If the central empirical and theoretical claims hold, the paper offers a useful mechanistic explanation for why simple linear baselines remain competitive on standard LTSF benchmarks. The public code is a clear strength that supports reproducibility and allows independent verification of the transition-matrix comparisons. The findings also usefully delineate the limits of linear approaches on non-periodic or multi-period data.

major comments (2)
  1. [Experimental analysis of transition matrices] The headline claim that 'models learning similar transition matrices from input to output' underpins the assertion that affine mapping is the dominant mechanism. The experimental section comparing these matrices across architectures does not report whether optimizers, learning rates, initializations, batch sizes, or early-stopping criteria were held constant; without such controls the observed similarity could be an artifact of training dynamics rather than evidence of data-driven convergence (see skeptic note).
  2. [Results and abstract] Results and abstract report performance numbers without error bars, standard deviations across random seeds, or statistical significance tests. This omission makes it difficult to assess whether the reported dominance of affine mapping over other components is robust, especially on the real-world benchmarks.
minor comments (2)
  1. [Theoretical analysis] The theoretical analysis for periodic signals is referenced but its assumptions (e.g., stationarity, exact periodicity) should be stated explicitly so readers can judge its applicability to the non-periodic cases examined later.
  2. [Figures and tables] Figure captions and table legends could more clearly indicate which models were trained with identical versus default hyperparameters when transition matrices are visualized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the experimental reporting.

read point-by-point responses
  1. Referee: [Experimental analysis of transition matrices] The headline claim that 'models learning similar transition matrices from input to output' underpins the assertion that affine mapping is the dominant mechanism. The experimental section comparing these matrices across architectures does not report whether optimizers, learning rates, initializations, batch sizes, or early-stopping criteria were held constant; without such controls the observed similarity could be an artifact of training dynamics rather than evidence of data-driven convergence (see skeptic note).

    Authors: We agree that explicit control of training hyperparameters is necessary to support the claim of data-driven convergence to similar transition matrices. All models in our experiments used the Adam optimizer with learning rate 0.001, batch size 32, and early stopping with patience of 3 epochs on validation loss, following the protocols in the original model papers. Initializations used PyTorch defaults. To eliminate any ambiguity, we will add a new subsection (Section 4.1) that tabulates these settings for every architecture and dataset, confirming they were held constant. revision: yes

  2. Referee: [Results and abstract] Results and abstract report performance numbers without error bars, standard deviations across random seeds, or statistical significance tests. This omission makes it difficult to assess whether the reported dominance of affine mapping over other components is robust, especially on the real-world benchmarks.

    Authors: We acknowledge the value of statistical reporting for assessing robustness. In the revised manuscript we will rerun all real-world experiments with five random seeds, report mean and standard deviation, and include paired t-test p-values comparing the linear baseline against competing models. These updates will appear in the results tables, the abstract, and a new paragraph in Section 5. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external benchmarks and independent theoretical analysis

full rationale

The paper's central claims rest on comprehensive experiments across simulated and real-world datasets plus a theoretical analysis of affine mapping for periodic signals. No equations or results reduce by construction to fitted parameters defined inside the paper, nor do any load-bearing steps collapse to self-citations or self-definitional loops. The transition-matrix similarity observation is an empirical outcome from running models on benchmarks rather than a renaming or forced prediction. The derivation chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard linear algebra for the affine map and on the assumption that the chosen benchmarks are representative; no new free parameters, axioms, or invented entities are introduced beyond those already standard in time-series modeling.

axioms (1)
  • domain assumption An affine transformation can exactly recover a periodic component when the input window length is sufficient relative to the period.
    Invoked in the theoretical analysis section referenced in the abstract.

pith-pipeline@v0.9.0 · 5793 in / 1249 out tokens · 18025 ms · 2026-05-24T08:29:06.449871+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    Theorem 1. Given a seasonal time series satisfying x(t)=s(t)=s(t−p) … there always exists an analytical solution for the linear model … W(k)_ij = 1 if i=n−kp+(j mod p), 0 otherwise

  • IndisputableMonolith/Foundation/DimensionForcing.lean reality_from_one_distinction echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    linear mapping can effectively capture periodic features … increasing input horizon improves performance on multi-channel data with different periods

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling

    cs.LG 2026-05 unverdicted novelty 7.0

    LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapt...

  2. Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting

    cs.LG 2025-09 unverdicted novelty 7.0

    Super-Linear introduces a pretrained MoE architecture using frequency-specialized linear experts and spectral gating for efficient general time series forecasting.

  3. GeoCert: Certified Geometric AI for Reliable Forecasting

    cs.LG 2026-04 unverdicted novelty 6.0

    GeoCert uses hyperbolic geometry to unify forecasting with physical reasoning and built-in formal certification, claiming major gains in accuracy and efficiency.

  4. UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration

    cs.LG 2026-03 unverdicted novelty 6.0

    UniMamba integrates Mamba state-space dynamics with attention layers and transforms like FFT-Laplace to outperform prior models on multivariate time series forecasting benchmarks.

  5. DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis

    cs.LG 2026-01 unverdicted novelty 6.0

    DeMa is a dual-path delay-aware Mamba architecture that decomposes MTS into intra-series temporal and inter-series variate paths to achieve SOTA performance with linear complexity on forecasting, imputation, anomaly d...

  6. Logo-LLM: Local and Global Modeling with Large Language Models for Time Series Forecasting

    cs.LG 2025-05 unverdicted novelty 6.0

    Logo-LLM improves time series forecasting by pulling local dynamics from shallow LLM layers and global trends from deeper layers, then aligning them via new Local-Mixer and Global-Mixer modules.

  7. Titans: Learning to Memorize at Test Time

    cs.LG 2024-12 unverdicted novelty 6.0

    Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

  8. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

    cs.LG 2023-10 unverdicted novelty 6.0

    By applying attention and feed-forward networks to inverted variate tokens instead of temporal tokens, iTransformer achieves state-of-the-art performance on real-world time series forecasting datasets.

  9. TSNN: A Non-parametric and Interpretable Framework for Traffic Time Series Forecasting

    cs.LG 2026-05 unverdicted novelty 5.0

    TSNN matches time series entries to a training-derived memory bank to forecast traffic without any trainable parameters and achieves competitive accuracy on four real-world datasets.

  10. Characteristic Root Analysis and Regularization for Linear Time Series Forecasting

    cs.LG 2025-09 unverdicted novelty 5.0

    Characteristic roots govern dynamics in linear forecasting models but noise induces spurious roots; rank reduction and Root Purge regularization mitigate this for more robust predictions.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 10 Pith papers · 1 internal anchor

  1. [1]

    Anderson, George E

    Oliver D. Anderson, George E. P. Box, and Gwilym M. Jenkins. Time series analysis: Forecast- ing and control. The Statistician, 27(3/4):265, September 1978

  2. [2]

    Angryk, Petrus C

    Rafal A. Angryk, Petrus C. Martens, Berkay Aydin, Dustin J. Kempton, Sushant S. Mahajan, Sunitha Basodi, Azim Ahmadzadeh, Xumin Cai, Soukaina Filali Boubrahimi, Shah Muhammad Hamdi, Michael A. Schuh, and Manolis K. Georgoulis. Multivariate time series dataset for space weather data analytics. Scientific data, 7:227, 2020

  3. [3]

    Stock price prediction using the arima model

    Adebiyi A Ariyo, Adewumi O Adewumi, and Charles K Ayo. Stock price prediction using the arima model. In 2014 UKSim-AMSS 16th international conference on computer modelling and simulation, pages 106–112. IEEE, 2014

  4. [4]

    An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

    Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolu- tional and recurrent networks for sequence modeling. ArXiv, abs/1803.01271, 2018

  5. [5]

    Petty, Alexander Skabardonis, Pravin Pratap Varaiya, and Zhanfeng Jia

    Chao Chen, Karl F. Petty, Alexander Skabardonis, Pravin Pratap Varaiya, and Zhanfeng Jia. Freeway performance measurement system: Mining loop detector data.Transportation Research Record, 1748:96–102, 2001

  6. [6]

    Arik, and Tomas Pfister

    Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, and Tomas Pfister. Tsmixer: An all-mlp architecture for time series forecasting. ArXiv, abs/2303.06053, 2023

  7. [7]

    Dish-ts: A general paradigm for alleviating distribution shift in time series forecasting

    Wei Fan, Pengyang Wang, Dongkun Wang, Dongjie Wang, Yuanchun Zhou, and Yanjie Fu. Dish-ts: A general paradigm for alleviating distribution shift in time series forecasting. ArXiv, abs/2302.14829, 2023

  8. [8]

    Forecasting seasonals and trends by exponentially weighted moving averages

    Charles C Holt. Forecasting seasonals and trends by exponentially weighted moving averages. International journal of forecasting, 20:5–10, 2004

  9. [9]

    Towards efficient electricity forecasting in residential and commercial buildings: A novel hybrid cnn with a lstm-ae based framework

    Zulfiqar Ahmad Khan, Tanveer Hussain, Amin Ullah, Seungmin Rho, Mi Young Lee, and Sung Wook Baik. Towards efficient electricity forecasting in residential and commercial buildings: A novel hybrid cnn with a lstm-ae based framework. Sensors, 20:1399, 2020

  10. [10]

    Reversible instance normalization for accurate time-series forecasting against distribution shift

    Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022

  11. [11]

    Modeling long- and short-term temporal patterns with deep neural networks

    Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long- and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018

  12. [12]

    Ti-mae: Self-supervised masked time series autoencoders

    Zhe Li, Zhongwen Rao, Lujia Pan, Pengyun Wang, and Zenglin Xu. Ti-mae: Self-supervised masked time series autoencoders. ArXiv, abs/2301.08871, 2023

  13. [13]

    Mts-mixers: Multivariate time series forecasting via factorized temporal and channel mixing

    Zhe Li, Zhongwen Rao, Lujia Pan, and Zenglin Xu. Mts-mixers: Multivariate time series forecasting via factorized temporal and channel mixing. ArXiv, abs/2302.04501, 2023

  14. [14]

    Scinet: Time series modeling and forecasting with sample convolution and interaction

    Minhao Liu, Ailing Zeng, Mu-Hwa Chen, Zhijian Xu, Qiuxia Lai, Lingna Ma, and Qiang Xu. Scinet: Time series modeling and forecasting with sample convolution and interaction. Advances in Neural Information Processing Systems, 35:5816–5828, 2022

  15. [15]

    Liu, and Schahram Dustdar

    Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X. Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International Conference on Learning Representations, 2022

  16. [16]

    Non-stationary transformers: Exploring the stationarity in time series forecasting

    Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting. In Neural Information Processing Systems, 2022

  17. [17]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019

  18. [18]

    A time series is worth 64 words: Long-term forecasting with transformers

    Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023

  19. [19]

    Pytorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, 10 Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-perfo...

  20. [20]

    Time series forecasting of petroleum production using deep lstm recurrent networks

    Alaa Sagheer and Mostafa Kotb. Time series forecasting of petroleum production using deep lstm recurrent networks. Neurocomputing, 323:203–213, 2019

  21. [21]

    Abdi, Lili Meng, and Tristan Sylvain

    Mohammad Amin Shabani, Amir H. Abdi, Lili Meng, and Tristan Sylvain. Scaleformer: Itera- tive multi-scale refining transformers for time series forecasting. In The Eleventh International Conference on Learning Representations, 2023

  22. [22]

    Multivariate temporal convolutional network: A deep neural networks approach for multivariate time series forecasting

    Renzhuo Wan, Shuping Mei, Jun Wang, Min Liu, and Fan Yang. Multivariate temporal convolutional network: A deep neural networks approach for multivariate time series forecasting. Electronics, 8:876, 2019

  23. [23]

    CoST: Con- trastive learning of disentangled seasonal-trend representations for time series forecasting

    Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven Hoi. CoST: Con- trastive learning of disentangled seasonal-trend representations for time series forecasting. In International Conference on Learning Representations, 2022

  24. [24]

    Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, and Steven C. H. Hoi. Etsformer: Exponential smoothing transformers for time-series forecasting. ArXiv, abs/2202.01381, 2022

  25. [25]

    Timesnet: Temporal 2d-variation modeling for general time series analysis

    Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. In The Eleventh International Conference on Learning Representations, 2023

  26. [26]

    Autoformer: Decomposition trans- formers with auto-correlation for long-term series forecasting

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition trans- formers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021

  27. [27]

    Zhang, and Qiang Xu

    Ailing Zeng, Mu-Hwa Chen, L. Zhang, and Qiang Xu. Are transformers effective for time series forecasting? ArXiv, abs/2205.13504, 2022

  28. [28]

    Neural network forecasting for seasonal and trend time series

    G Peter Zhang and Min Qi. Neural network forecasting for seasonal and trend time series. European journal of operational research, 160:501–514, 2005

  29. [29]

    Maddix, and Yuyang Wang

    Xiyuan Zhang, Xiaoyong Jin, Karthick Gopalswamy, Gaurav Gupta, Youngsuk Park, Xingjian Shi, Hongya Wang, Danielle C. Maddix, and Yuyang Wang. First de-trend then attend: Rethink- ing attention for time-series forecasting. ArXiv, abs/2212.08151, 2022

  30. [30]

    Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

    Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023

  31. [31]

    Informer: Beyond efficient transformer for long sequence time-series forecasting

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021

  32. [32]

    Film: Frequency improved legendre memory model for long-term time series forecasting

    Tian Zhou, Ziqing Ma, Xue Wang, Qingsong Wen, Liang Sun, Tao Yao, and Rong Jin. Film: Frequency improved legendre memory model for long-term time series forecasting. ArXiv, abs/2205.08897, 2022

  33. [33]

    Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pages 27268–27286. PMLR, 2022. 11 A Proofs For better readability, we have re-listed the unproven theorems as follows. Theorem 1. Given a seasonal tim...