pith. machine review for the scientific record. sign in

arxiv: 2605.08289 · v1 · submitted 2026-05-08 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords multivariate time series forecastingsparse bottleneckcross-variable dependenciesselective routinginformation flowspurious correlationsstate-dependent dependencies
0
0 comments X

The pith

MS-FLOW replaces dense cross-variable links with a capacity-limited sparse routing to produce more reliable multivariate forecasts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multivariate time series forecasting depends on capturing how variables influence one another, yet real dependencies are often noisy and change with conditions. Existing dense interaction methods can spread spurious signals and over-smooth representations, hurting reliability in specific states. MS-FLOW enforces a strict budget on information flow by routing signals only through a small number of selected dependency paths. This produces forecasts with higher accuracy across twelve real-world datasets while retaining fewer but more trustworthy connections. The result reframes the problem as one of effective rather than maximal interaction.

Core claim

MS-FLOW replaces fully connected communication with selective sparse routing, retaining only a few critical dependency paths and injecting cross-variable signals under a strict communication budget, thereby suppressing redundant connections and spurious-correlation propagation. Extensive experiments demonstrate that MS-FLOW learns more reliable multivariate correlations, achieving state-of-the-art forecasting accuracy on 12 real-world benchmarks while producing fewer yet more reliable dependencies.

What carries the argument

The sparse-bottleneck framework that models inter-variable interaction as capacity-limited information flow via selective sparse routing.

If this is right

  • Forecasting models become more robust when cross-variable signals are restricted to a small set of critical paths.
  • Representation over-smoothing decreases because redundant and noisy links are suppressed by the communication budget.
  • Performance gains appear on diverse real-world datasets rather than synthetic ones with clean dense dependencies.
  • The design shifts emphasis from adding more interaction mechanisms to controlling which interactions are allowed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same bottleneck idea could be tested in other sequence tasks where dense attention spreads noise across unrelated tokens.
  • An adaptive version of the budget might further improve results on datasets whose dependency density varies strongly over time.
  • The framework implies that explicit capacity limits can serve as a regularizer in any multivariate modeling setting prone to spurious links.

Load-bearing premise

Dependencies between variables in real data are typically state-dependent and noisy, so dense connections tend to amplify spurious correlations and over-smooth representations.

What would settle it

If a version of MS-FLOW with the sparsity removed (allowing full dense connections) matches or exceeds its accuracy on the same twelve benchmarks without increasing measured spurious correlations, the central claim would be weakened.

Figures

Figures reproduced from arXiv: 2605.08289 by Fan Zhang, Hua Wang, Shiming Fan.

Figure 1
Figure 1. Figure 1: MS-FLOW’s motivation and overview. (a) State dependency: Cross-variable dependencies vary over time and can switch across operating conditions. (b) Spurious correlation: Local perturbations may induce non-generalizable, misleading correlations. (c) Channel-processing strategies: Different paradigms for handling cross-channel interactions.(d) MS-FLOW: Sparse-flow bottleneck: MS-FLOW constrains cross-variabl… view at source ↗
Figure 2
Figure 2. Figure 2: The overall architecture of MS-FLOW consists of: (a) Patch Embedding to represent the input X; (b) Patch-wise temporal interaction to capture temporal patterns; (c) A sparse dependency bottleneck to selectively route cross-variable information for predicting Y. 2. Related Work In multivariate time series forecasting (MTSF), how to ef￾fectively model cross-variable dependencies has long been a central chall… view at source ↗
Figure 4
Figure 4. Figure 4: Model effectiveness and efficiency comparison on the ETTm2 and Traffic datasets. ability as the number of variables grows. Notably, MS￾FLOW may incur slightly higher resource cost than some lightweight models that discard cross-variable modeling (e.g., xPatch and DLinear) on Traffic, but it delivers substan￾tially better accuracy, demonstrating a favorable trade-off between efficiency and effective cross-v… view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity analysis of the key hyperparameters in MS￾FLOW. Hyperparameter Studies. We analyze the sensitivity of MS-FLOW to two key hyperparameters: the sparsity level K in the sparse bottleneck and the encoder depth Layers. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Impact of different lookback window lengths on fore￾casting accuracy across datasets; the shaded regions indicate the maximum performance gap between MS-FLOW and the baseline models. Influence of look-back horizon. We study the influence of lookback window length on forecasting performance to evaluate how effectively MS-FLOW utilizes historical in￾formation. Specifically, we progressively increase the inpu… view at source ↗
Figure 7
Figure 7. Figure 7: We visualize the forecasting results on ETTh2, ETTm2, ECL, and Traffic with a prediction horizon of 96. For each dataset, the ground truth is shown as a solid line and the prediction as a dashed line. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: We visualize the forecasting results on ETTh2, ETTm2, ECL, and Traffic with a prediction horizon of 192. For each dataset, the ground truth is shown as a solid line and the prediction as a dashed line. F. Visualization of prediction results To qualitatively evaluate the forecasting behavior of MS-FLOW, we visualize the prediction results on four representative datasets, including ETTh2, ETTm2, Electricity … view at source ↗
Figure 9
Figure 9. Figure 9: We visualize the forecasting results on ETTh2, ETTm2, ECL, and Traffic with a prediction horizon of 336. For each dataset, the ground truth is shown as a solid line and the prediction as a dashed line. 0 100 200 300 400 500 600 700 1.5 1.0 0.5 0.0 0.5 a) ETTh2 True Value Prediction 0 100 200 300 400 500 600 700 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 b) ETTm2 True Value Prediction 0 100 200 300 400 500 600 700 1.5… view at source ↗
Figure 10
Figure 10. Figure 10: We visualize the forecasting results on ETTh2, ETTm2, ECL, and Traffic with a prediction horizon of 720. For each dataset, the ground truth is shown as a solid line and the prediction as a dashed line. G. Time series channel correlation analysis G.1. Additional Noise-Robustness Analysis To further examine the noise robustness of MS-FLOW, we extend the noise-injection study to four subsets of the PEM datas… view at source ↗
Figure 11
Figure 11. Figure 11: Additional noise-robustness analysis on four PEM subsets. For each subset, we compare the pollution hit rate of Top-K , Random-K, and Dense strategies, and visualize the corresponding learned adjacency matrices with polluted channels highlighted (right). G.2. Analysis of multivariate correlations By explicitly modeling cross-variable interaction as sparse information flow, MS-FLOW yields dependency maps w… view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of correlation structures and learned dependency maps. We compare the input correlation, intermediate dependency maps learned by MS-FLOW, and the target correlation. The gradual transition illustrates how cross-variable dependencies are progressively reshaped during encoding. capacity and efficiency. However, in more extreme high-dimensional scenarios (e.g., datasets with thousands of variab… view at source ↗
read the original abstract

Multivariate time series forecasting is critical in many real-world systems, and thus modeling cross-channel dependencies is essential. Although existing methods improve overall accuracy by enhancing representations and cross-channel interactions, it remains challenging to reliably capture inter-variable dependencies under specific conditions. We observe that dependencies in real data are often state-dependent and noisy; in such cases, dense interactions can amplify spurious correlations and lead to representation over-smoothing, which may yield unreliable predictions in certain scenarios. Motivated by this, we propose MS-FLOW, a sparse-bottleneck framework that explicitly models inter-variable interaction as capacity-limited information flow. Specifically, MS-FLOW replaces fully connected communication with selective sparse routing, retaining only a few critical dependency paths and injecting cross-variable signals under a strict communication budget, thereby suppressing redundant connections and spurious-correlation propagation. Extensive experiments demonstrate that MS-FLOW learns more reliable multivariate correlations, achieving state-of-the-art forecasting accuracy on 12 real-world benchmarks while producing fewer yet more reliable dependencies, shifting multivariate forecasting from "more interaction" to "more effective interaction".

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes MS-FLOW, a sparse-bottleneck framework for multivariate time series forecasting. It observes that real-world dependencies are often state-dependent and noisy, such that dense cross-variable interactions amplify spurious correlations and cause representation over-smoothing. MS-FLOW replaces fully connected layers with selective sparse routing that retains only a few critical dependency paths under a strict communication budget, claiming this yields more reliable correlations, state-of-the-art accuracy on 12 real-world benchmarks, and a shift from 'more interaction' to 'more effective interaction'.

Significance. If the empirical claims are substantiated with rigorous controls, the work offers a conceptually clean architectural alternative that could influence future MTS models by treating inter-variable communication as a capacity-limited resource rather than an unrestricted graph. The emphasis on suppressing redundant paths while preserving state-dependent signals is a timely contribution to robustness in noisy forecasting settings.

major comments (2)
  1. [Experimental Evaluation] Experimental section (and associated tables/figures): the headline claim that MS-FLOW produces 'more reliable' dependencies rests solely on forecasting accuracy gains; no independent metric (e.g., dependency recovery on synthetic data with known ground-truth paths, or a reliability score orthogonal to MSE/MAE) is reported to separate the effect of sparsity from generic regularization.
  2. [Proposed Method] Method section: the precise definition of the 'strict communication budget' and the routing mechanism (e.g., how critical paths are selected at each time step or state) is not formalized with equations or pseudocode that would allow reproduction or analysis of whether the selection is demonstrably more robust than dense baselines under the paper's own noise assumptions.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'fewer yet more reliable dependencies' is repeated without a parenthetical clarification of how reliability is quantified beyond accuracy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below, agreeing that clarifications and additions will strengthen the manuscript.

read point-by-point responses
  1. Referee: [Experimental Evaluation] Experimental section (and associated tables/figures): the headline claim that MS-FLOW produces 'more reliable' dependencies rests solely on forecasting accuracy gains; no independent metric (e.g., dependency recovery on synthetic data with known ground-truth paths, or a reliability score orthogonal to MSE/MAE) is reported to separate the effect of sparsity from generic regularization.

    Authors: We acknowledge that the primary support for claiming more reliable dependencies comes from the observed forecasting accuracy improvements across the 12 benchmarks. While these gains occur in settings with noisy, state-dependent dependencies—where dense models are prone to spurious correlations—we agree that an independent metric would more directly isolate the benefits of the sparse routing. In the revised manuscript, we will add synthetic experiments with known ground-truth dependency structures to quantify path recovery accuracy separately from MSE/MAE. revision: yes

  2. Referee: [Proposed Method] Method section: the precise definition of the 'strict communication budget' and the routing mechanism (e.g., how critical paths are selected at each time step or state) is not formalized with equations or pseudocode that would allow reproduction or analysis of whether the selection is demonstrably more robust than dense baselines under the paper's own noise assumptions.

    Authors: We thank the referee for highlighting this gap. The method section currently presents the sparse-bottleneck routing conceptually. We will revise it to include explicit equations defining the communication budget constraint, the per-timestep path selection criterion, and pseudocode for the overall procedure. This will facilitate reproduction and enable direct comparison of robustness under the noise models discussed in the paper. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical benchmarks, not self-referential derivations

full rationale

The paper introduces MS-FLOW as an architectural proposal motivated by observational remarks on state-dependent noise in real data. No mathematical derivation chain is presented that reduces a claimed result (such as 'more reliable dependencies') to a fitted parameter, self-citation, or input by construction. The abstract and available text contain no equations equating predictions to inputs, no uniqueness theorems imported from prior author work, and no ansatz smuggled via citation. Central claims are evidenced by forecasting accuracy on 12 external benchmarks, which are independently falsifiable and not forced by the model's internal definitions. This is the expected non-finding for an empirical architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that real multivariate dependencies are frequently state-dependent and noisy and that limiting information flow will suppress spurious correlations without discarding critical signals.

axioms (1)
  • domain assumption Dependencies in real data are often state-dependent and noisy
    Explicitly stated as the observation motivating the sparse design.

pith-pipeline@v0.9.0 · 5481 in / 1134 out tokens · 47611 ms · 2026-05-12T02:29:05.072806+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

120 extracted references · 120 canonical work pages · 8 internal anchors

  1. [1]

    NeurIPS , year =

    DBLoss: Decomposition-based Loss Function for Time Series Forecasting , author =. NeurIPS , year =

  2. [2]

    arXiv preprint arXiv:2509.14933 , year=

    DAG: A Dual Correlation Network for Time Series Forecasting with Exogenous Variables , author=. arXiv preprint arXiv:2509.14933 , year=

  3. [3]

    AAAI , year=

    Rethinking Irregular Time Series Forecasting: A Simple yet Effective Baseline , author=. AAAI , year=

  4. [4]

    arXiv preprint arXiv:2502.10721 , year=

    A Comprehensive Survey of Deep Learning for Multivariate Time Series Forecasting: A Channel Strategy Perspective , author=. arXiv preprint arXiv:2502.10721 , year=

  5. [5]

    Jensen and Bin Yang , title =

    Xiangfei Qiu and Xiuwen Li and Ruiyang Pang and Zhicheng Pan and Xingjian Wu and Liu Yang and Jilin Hu and Yang Shu and Xuesong Lu and Chengcheng Yang and Chenjuan Guo and Aoying Zhou and Christian S. Jensen and Bin Yang , title =. ICDE , year =

  6. [6]

    Proceedings of the 32nd ACM international conference on information and knowledge management , pages=

    Famc-net: Frequency domain parity correction attention and multi-scale dilated convolution for time series forecasting , author=. Proceedings of the 32nd ACM international conference on information and knowledge management , pages=

  7. [7]

    Journal of physics: conference series , volume=

    Performance analysis of various activation functions in artificial neural networks , author=. Journal of physics: conference series , volume=. 2019 , organization=

  8. [8]

    Towards Data Sci , volume=

    Activation functions in neural networks , author=. Towards Data Sci , volume=

  9. [9]

    Ieee Access , volume=

    A prediction approach for stock market volatility based on time series data , author=. Ieee Access , volume=. 2019 , publisher=

  10. [10]

    Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25 , volume=

    Conditional information bottleneck-based multivariate time series forecasting , author=. Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25 , volume=

  11. [11]

    A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

    A time series is worth 64 words: Long-term forecasting with transformers , author=. arXiv preprint arXiv:2211.14730 , year=

  12. [12]

    arXiv preprint arXiv:2505.17431 , year=

    Hyperimts: Hypergraph neural network for irregular multivariate time series forecasting , author=. arXiv preprint arXiv:2505.17431 , year=

  13. [13]

    Searching for Activation Functions

    Searching for activation functions , author=. arXiv preprint arXiv:1710.05941 , year=

  14. [14]

    Neurocomputing , volume=

    Activation functions in deep learning: A comprehensive survey and benchmark , author=. Neurocomputing , volume=. 2022 , publisher=

  15. [15]

    Evolutionary Intelligence , volume=

    An analysis of weight initialization methods in connection with different activation functions for feedforward neural networks , author=. Evolutionary Intelligence , volume=. 2024 , publisher=

  16. [16]

    Proceedings of the ACM Web Conference 2026 , pages=

    Time-TK: A Multi-Offset Temporal Interaction Framework Combining Transformer and Kolmogorov-Arnold Networks for Time Series Forecasting , author=. Proceedings of the ACM Web Conference 2026 , pages=

  17. [17]

    arXiv preprint arXiv:2509.19406 , year=

    TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding , author=. arXiv preprint arXiv:2509.19406 , year=

  18. [18]

    Knowledge-Based Systems , pages=

    FSMamba: A Dual-Expert Architecture with Fast Global Attention and Local-Enhanced State-space Mamba for Time Series Forecasting , author=. Knowledge-Based Systems , pages=. 2025 , publisher=

  19. [19]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Chattime: A unified multimodal time series foundation model bridging numerical and textual data , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  20. [20]

    International Encyclopedia of Statistical Science , pages=

    Nonlinear time series analysis , author=. International Encyclopedia of Statistical Science , pages=. 2025 , publisher=

  21. [21]

    Multivariate Behavioral Research , volume=

    Non-stationarity in time-series analysis: Modeling stochastic and deterministic trends , author=. Multivariate Behavioral Research , volume=. 2025 , publisher=

  22. [22]

    arXiv preprint arXiv:2010.07359 , year=

    Effects of the nonlinearity in activation functions on the performance of deep learning models , author=. arXiv preprint arXiv:2010.07359 , year=

  23. [23]

    international conference on machine learning , pages=

    On the expressive power of deep neural networks , author=. international conference on machine learning , pages=. 2017 , organization=

  24. [24]

    IEEE Access , year=

    Activation functions in non-negative neural networks , author=. IEEE Access , year=

  25. [25]

    IEEE Internet of Things Journal , year=

    An Interactive Attention Mechanism Network Integrating the C 1 Activation Function for Time Series Forecasting , author=. IEEE Internet of Things Journal , year=

  26. [26]

    Neural processing letters , volume=

    An analysis of activation function saturation in particle swarm optimization trained neural networks , author=. Neural processing letters , volume=. 2020 , publisher=

  27. [27]

    2025 , school=

    Time series forecasting using neural networks without recurrent connections , author=. 2025 , school=

  28. [28]

    Procedia computer science , volume=

    Research on convolutional neural network based on improved Relu piecewise activation function , author=. Procedia computer science , volume=. 2018 , publisher=

  29. [29]

    IEEE Access , year=

    P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation , author=. IEEE Access , year=

  30. [30]

    High-Fidelity Photovoltaic Power Forecasting Using a Skip-Fusion DNN with GELU Activation and AdamW Optimization , author=

  31. [31]

    Medical Engineering & Physics , volume=

    Classification of acute myeloid leukemia by pre-trained deep neural networks: A comparison with different activation functions , author=. Medical Engineering & Physics , volume=. 2025 , publisher=

  32. [32]

    Nonlinear Dynamics , volume=

    Grid multi-scroll attractors in cellular neural network with a new activation function and pulse current stimulation , author=. Nonlinear Dynamics , volume=. 2025 , publisher=

  33. [33]

    Computational Statistics , volume=

    Deep learning-based survival analysis with copula-based activation functions for multivariate response prediction , author=. Computational Statistics , volume=. 2025 , publisher=

  34. [34]

    Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , year=

    Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks , author=. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , year=

  35. [35]

    Jensen , title =

    Zezhi Shao and Zhao Zhang and Wei Wei and Fei Wang and Yongjun Xu and Xin Cao and Christian S. Jensen , title =. Proc

  36. [36]

    International Conference on Learning Representations , year=

    Discrete Graph Structure Learning for Forecasting Multiple Time Series , author=. International Conference on Learning Representations , year=

  37. [37]

    Advances in neural information processing systems , volume=

    Adaptive graph convolutional recurrent network for traffic forecasting , author=. Advances in neural information processing systems , volume=

  38. [38]

    Forty-second International Conference on Machine Learning , year=

    Temporal Query Network for Efficient Multivariate Time Series Forecasting , author=. Forty-second International Conference on Machine Learning , year=

  39. [39]

    Computational Visual Media , volume=

    CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network , author=. Computational Visual Media , volume=. 2024 , publisher=

  40. [40]

    Computer Aided Geometric Design , volume=

    Computing nodes for plane data points by constructing cubic polynomial with constraints , author=. Computer Aided Geometric Design , volume=. 2024 , publisher=

  41. [41]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Activate or not: Learning customized activation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  42. [42]

    International conference on machine learning , pages=

    Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting , author=. International conference on machine learning , pages=. 2022 , organization=

  43. [43]

    Knowledge-Based Systems , pages=

    CAWformer: A cross variable attention with discrete wavelet denoising for multivariate time series forecasting , author=. Knowledge-Based Systems , pages=. 2025 , publisher=

  44. [44]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  45. [45]

    Hao Wang and Lichen Pan and Yuan Shen and Zhichao Chen and Degui Yang and Yifei Yang and Sen Zhang and Xinggao Liu and Haoxuan Li and Dacheng Tao , booktitle=. Fre. 2025 , url=

  46. [46]

    BMC medical informatics and decision making , volume=

    The role of explainable artificial intelligence in disease prediction: a systematic literature review and future research directions , author=. BMC medical informatics and decision making , volume=. 2025 , publisher=

  47. [47]

    The 41st international ACM SIGIR conference on research & development in information retrieval , pages=

    Modeling long-and short-term temporal patterns with deep neural networks , author=. The 41st international ACM SIGIR conference on research & development in information retrieval , pages=

  48. [48]

    TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

    TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting , author=. arXiv preprint arXiv:2604.12648 , year=

  49. [49]

    Advances in Neural Information Processing Systems , volume=

    Mofo: Empowering long-term time series forecasting with periodic pattern modeling , author=. Advances in Neural Information Processing Systems , volume=

  50. [50]

    Forty-second International Conference on Machine Learning , year=

    Hi-patch: Hierarchical patch gnn for irregular multivariate time series , author=. Forty-second International Conference on Machine Learning , year=

  51. [51]

    International conference on learning representations , year=

    Reversible instance normalization for accurate time-series forecasting against distribution shift , author=. International conference on learning representations , year=

  52. [52]

    Songtao Huang and Zhen Zhao and Can Li and LEI BAI , booktitle=. Time. 2025 , url=

  53. [53]

    2007 , issn =

    Environmental time series analysis and forecasting with the Captain toolbox , journal =. 2007 , issn =. doi:10.1016/j.envsoft.2006.03.002 , url=

  54. [54]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Decomposed Spatio-Temporal Mamba for Long-Term Traffic Prediction , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  55. [55]

    Two-Stage Framework for Seasonal Time Series Forecasting , year=

    Xu, Qingyang and Wen, Qingsong and Sun, Liang , booktitle=. Two-Stage Framework for Seasonal Time Series Forecasting , year=

  56. [56]

    2024 IEEE International Conference on Multimedia and Expo (ICME) , pages=

    Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition , author=. 2024 IEEE International Conference on Multimedia and Expo (ICME) , pages=. 2024 , organization=

  57. [57]

    The Twelfth International Conference on Learning Representations , year=

    iTransformer: Inverted Transformers Are Effective for Time Series Forecasting , author=. The Twelfth International Conference on Learning Representations , year=

  58. [58]

    Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,

    VCformer: Variable Correlation Transformer with Inherent Lagged Correlation for Multivariate Time Series Forecasting , author =. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,. 2024 , month =. doi:10.24963/ijcai.2024/590 , url =

  59. [59]

    Rectifier nonlinearities improve neural network acoustic models , author=. Proc. icml , volume=. 2013 , publisher=

  60. [60]

    Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining , pages=

    Fredformer: Frequency debiased transformer for time series forecasting , author=. Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining , pages=

  61. [61]

    Advances in Neural Information Processing Systems , volume=

    Filternet: Harnessing frequency filters for time series forecasting , author=. Advances in Neural Information Processing Systems , volume=

  62. [62]

    arXiv preprint arXiv:2404.14757 , year=

    Sst: Multi-scale hybrid mamba-transformer experts for long-short range time series forecasting , author=. arXiv preprint arXiv:2404.14757 , year=

  63. [63]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Filterts: Comprehensive frequency filtering for multivariate time series forecasting , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  64. [64]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Are transformers effective for time series forecasting? , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  65. [65]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Informer: Beyond efficient transformer for long sequence time-series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  66. [66]

    Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Rangapuram, Danielle C

    Enhancing foundation models for time series forecasting via wavelet-based tokenization , author=. arXiv preprint arXiv:2412.05244 , year=

  67. [67]

    arXiv preprint arXiv:2502.12920 , year=

    Lightweight online adaption for time series foundation model forecasts , author=. arXiv preprint arXiv:2502.12920 , year=

  68. [68]

    Advances in Neural Information Processing Systems , volume=

    Timexer: Empowering transformers for time series forecasting with exogenous variables , author=. Advances in Neural Information Processing Systems , volume=

  69. [69]

    The eleventh international conference on learning representations , year=

    Timesnet: Temporal 2d-variation modeling for general time series analysis , author=. The eleventh international conference on learning representations , year=

  70. [70]

    arXiv preprint arXiv:2505.20774 , year=

    Timepro: Efficient multivariate long-term time series forecasting with variable-and time-aware hyper-state , author=. arXiv preprint arXiv:2505.20774 , year=

  71. [71]

    IEEE transactions on antennas and propagation , volume=

    On mm-wave multipath clustering and channel modeling , author=. IEEE transactions on antennas and propagation , volume=. 2013 , publisher=

  72. [72]

    Advances in Neural Information Processing Systems , volume=

    From similarity to superiority: Channel clustering for time series forecasting , author=. Advances in Neural Information Processing Systems , volume=

  73. [73]

    Forty-first International Conference on Machine Learning , year=

    Irregular multivariate time series forecasting: A transformable patching graph neural networks approach , author=. Forty-first International Conference on Machine Learning , year=

  74. [74]

    arXiv preprint arXiv:2501.13041 , year=

    TimeFilter: Patch-specific spatial-temporal graph filtration for time series forecasting , author=. arXiv preprint arXiv:2501.13041 , year=

  75. [75]

    International Conference on Learning Representations , year=

    Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting , author=. International Conference on Learning Representations , year=

  76. [76]

    Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

    Duet: Dual clustering enhanced multivariate time series forecasting , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 , pages=

  77. [77]

    The eleventh international conference on learning representations , year=

    Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting , author=. The eleventh international conference on learning representations , year=

  78. [78]

    Advances in neural information processing systems , volume=

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting , author=. Advances in neural information processing systems , volume=

  79. [79]

    arXiv preprint arXiv:2405.14616 , year=

    Timemixer: Decomposable multiscale mixing for time series forecasting , author=. arXiv preprint arXiv:2405.14616 , year=

  80. [80]

    Advances in Neural Information Processing Systems , volume=

    Softs: Efficient multivariate time series forecasting with series-core fusion , author=. Advances in Neural Information Processing Systems , volume=

Showing first 80 references.