arxiv: 2605.08289 · v1 · submitted 2026-05-08 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies

Fan Zhang , Shiming Fan , Hua Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords multivariate time series forecastingsparse bottleneckcross-variable dependenciesselective routinginformation flowspurious correlationsstate-dependent dependencies

0 comments

The pith

MS-FLOW replaces dense cross-variable links with a capacity-limited sparse routing to produce more reliable multivariate forecasts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multivariate time series forecasting depends on capturing how variables influence one another, yet real dependencies are often noisy and change with conditions. Existing dense interaction methods can spread spurious signals and over-smooth representations, hurting reliability in specific states. MS-FLOW enforces a strict budget on information flow by routing signals only through a small number of selected dependency paths. This produces forecasts with higher accuracy across twelve real-world datasets while retaining fewer but more trustworthy connections. The result reframes the problem as one of effective rather than maximal interaction.

Core claim

MS-FLOW replaces fully connected communication with selective sparse routing, retaining only a few critical dependency paths and injecting cross-variable signals under a strict communication budget, thereby suppressing redundant connections and spurious-correlation propagation. Extensive experiments demonstrate that MS-FLOW learns more reliable multivariate correlations, achieving state-of-the-art forecasting accuracy on 12 real-world benchmarks while producing fewer yet more reliable dependencies.

What carries the argument

The sparse-bottleneck framework that models inter-variable interaction as capacity-limited information flow via selective sparse routing.

If this is right

Forecasting models become more robust when cross-variable signals are restricted to a small set of critical paths.
Representation over-smoothing decreases because redundant and noisy links are suppressed by the communication budget.
Performance gains appear on diverse real-world datasets rather than synthetic ones with clean dense dependencies.
The design shifts emphasis from adding more interaction mechanisms to controlling which interactions are allowed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bottleneck idea could be tested in other sequence tasks where dense attention spreads noise across unrelated tokens.
An adaptive version of the budget might further improve results on datasets whose dependency density varies strongly over time.
The framework implies that explicit capacity limits can serve as a regularizer in any multivariate modeling setting prone to spurious links.

Load-bearing premise

Dependencies between variables in real data are typically state-dependent and noisy, so dense connections tend to amplify spurious correlations and over-smooth representations.

What would settle it

If a version of MS-FLOW with the sparsity removed (allowing full dense connections) matches or exceeds its accuracy on the same twelve benchmarks without increasing measured spurious correlations, the central claim would be weakened.

Figures

Figures reproduced from arXiv: 2605.08289 by Fan Zhang, Hua Wang, Shiming Fan.

**Figure 1.** Figure 1: MS-FLOW’s motivation and overview. (a) State dependency: Cross-variable dependencies vary over time and can switch across operating conditions. (b) Spurious correlation: Local perturbations may induce non-generalizable, misleading correlations. (c) Channel-processing strategies: Different paradigms for handling cross-channel interactions.(d) MS-FLOW: Sparse-flow bottleneck: MS-FLOW constrains cross-variabl… view at source ↗

**Figure 2.** Figure 2: The overall architecture of MS-FLOW consists of: (a) Patch Embedding to represent the input X; (b) Patch-wise temporal interaction to capture temporal patterns; (c) A sparse dependency bottleneck to selectively route cross-variable information for predicting Y. 2. Related Work In multivariate time series forecasting (MTSF), how to effectively model cross-variable dependencies has long been a central chall… view at source ↗

**Figure 4.** Figure 4: Model effectiveness and efficiency comparison on the ETTm2 and Traffic datasets. ability as the number of variables grows. Notably, MSFLOW may incur slightly higher resource cost than some lightweight models that discard cross-variable modeling (e.g., xPatch and DLinear) on Traffic, but it delivers substantially better accuracy, demonstrating a favorable trade-off between efficiency and effective cross-v… view at source ↗

**Figure 5.** Figure 5: Sensitivity analysis of the key hyperparameters in MSFLOW. Hyperparameter Studies. We analyze the sensitivity of MS-FLOW to two key hyperparameters: the sparsity level K in the sparse bottleneck and the encoder depth Layers. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Impact of different lookback window lengths on forecasting accuracy across datasets; the shaded regions indicate the maximum performance gap between MS-FLOW and the baseline models. Influence of look-back horizon. We study the influence of lookback window length on forecasting performance to evaluate how effectively MS-FLOW utilizes historical information. Specifically, we progressively increase the inpu… view at source ↗

**Figure 7.** Figure 7: We visualize the forecasting results on ETTh2, ETTm2, ECL, and Traffic with a prediction horizon of 96. For each dataset, the ground truth is shown as a solid line and the prediction as a dashed line. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: We visualize the forecasting results on ETTh2, ETTm2, ECL, and Traffic with a prediction horizon of 192. For each dataset, the ground truth is shown as a solid line and the prediction as a dashed line. F. Visualization of prediction results To qualitatively evaluate the forecasting behavior of MS-FLOW, we visualize the prediction results on four representative datasets, including ETTh2, ETTm2, Electricity … view at source ↗

**Figure 9.** Figure 9: We visualize the forecasting results on ETTh2, ETTm2, ECL, and Traffic with a prediction horizon of 336. For each dataset, the ground truth is shown as a solid line and the prediction as a dashed line. 0 100 200 300 400 500 600 700 1.5 1.0 0.5 0.0 0.5 a) ETTh2 True Value Prediction 0 100 200 300 400 500 600 700 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 b) ETTm2 True Value Prediction 0 100 200 300 400 500 600 700 1.5… view at source ↗

**Figure 10.** Figure 10: We visualize the forecasting results on ETTh2, ETTm2, ECL, and Traffic with a prediction horizon of 720. For each dataset, the ground truth is shown as a solid line and the prediction as a dashed line. G. Time series channel correlation analysis G.1. Additional Noise-Robustness Analysis To further examine the noise robustness of MS-FLOW, we extend the noise-injection study to four subsets of the PEM datas… view at source ↗

**Figure 11.** Figure 11: Additional noise-robustness analysis on four PEM subsets. For each subset, we compare the pollution hit rate of Top-K , Random-K, and Dense strategies, and visualize the corresponding learned adjacency matrices with polluted channels highlighted (right). G.2. Analysis of multivariate correlations By explicitly modeling cross-variable interaction as sparse information flow, MS-FLOW yields dependency maps w… view at source ↗

**Figure 12.** Figure 12: Visualization of correlation structures and learned dependency maps. We compare the input correlation, intermediate dependency maps learned by MS-FLOW, and the target correlation. The gradual transition illustrates how cross-variable dependencies are progressively reshaped during encoding. capacity and efficiency. However, in more extreme high-dimensional scenarios (e.g., datasets with thousands of variab… view at source ↗

read the original abstract

Multivariate time series forecasting is critical in many real-world systems, and thus modeling cross-channel dependencies is essential. Although existing methods improve overall accuracy by enhancing representations and cross-channel interactions, it remains challenging to reliably capture inter-variable dependencies under specific conditions. We observe that dependencies in real data are often state-dependent and noisy; in such cases, dense interactions can amplify spurious correlations and lead to representation over-smoothing, which may yield unreliable predictions in certain scenarios. Motivated by this, we propose MS-FLOW, a sparse-bottleneck framework that explicitly models inter-variable interaction as capacity-limited information flow. Specifically, MS-FLOW replaces fully connected communication with selective sparse routing, retaining only a few critical dependency paths and injecting cross-variable signals under a strict communication budget, thereby suppressing redundant connections and spurious-correlation propagation. Extensive experiments demonstrate that MS-FLOW learns more reliable multivariate correlations, achieving state-of-the-art forecasting accuracy on 12 real-world benchmarks while producing fewer yet more reliable dependencies, shifting multivariate forecasting from "more interaction" to "more effective interaction".

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MS-FLOW adds a sparse bottleneck with selective routing to limit noisy cross-variable signals in forecasting, but the reliability claim rests on accuracy gains without separate validation.

read the letter

The paper's main contribution is MS-FLOW, which treats cross-variable interactions as a capacity-limited flow and replaces dense connections with selective sparse routing under a strict budget. This is positioned as a way to avoid amplifying spurious correlations and over-smoothing in state-dependent data. The shift from maximizing interactions to constraining them is the clearest new angle, and it directly targets a practical headache in multivariate series where variables have conditional rather than constant relationships. If the full methods show a clean implementation of the routing and budget, that could give practitioners a concrete lever to try. The reported SOTA numbers on 12 benchmarks are the main positive signal offered. The soft spots are more noticeable. The abstract supplies no protocol details, baseline comparisons, ablations on the routing mechanism, or statistical tests, so the accuracy claims cannot be assessed yet. More importantly, there is no independent metric showing that the retained paths are genuinely more reliable rather than simply fewer; the stress-test concern holds because higher forecast accuracy alone does not prove the sparsity suppressed noise instead of acting as generic regularization. Without that link or code-level reproducibility, the causal story stays interpretive. This work is aimed at applied forecasting researchers and engineers who already deal with high-dimensional series and are open to sparsity constraints. A reader looking for new architectural knobs might extract usable ideas, but only after seeing the implementation. It deserves peer review so the experimental setup and any ablations can be checked properly; the underlying motivation is sound even if the current evidence is thin.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes MS-FLOW, a sparse-bottleneck framework for multivariate time series forecasting. It observes that real-world dependencies are often state-dependent and noisy, such that dense cross-variable interactions amplify spurious correlations and cause representation over-smoothing. MS-FLOW replaces fully connected layers with selective sparse routing that retains only a few critical dependency paths under a strict communication budget, claiming this yields more reliable correlations, state-of-the-art accuracy on 12 real-world benchmarks, and a shift from 'more interaction' to 'more effective interaction'.

Significance. If the empirical claims are substantiated with rigorous controls, the work offers a conceptually clean architectural alternative that could influence future MTS models by treating inter-variable communication as a capacity-limited resource rather than an unrestricted graph. The emphasis on suppressing redundant paths while preserving state-dependent signals is a timely contribution to robustness in noisy forecasting settings.

major comments (2)

[Experimental Evaluation] Experimental section (and associated tables/figures): the headline claim that MS-FLOW produces 'more reliable' dependencies rests solely on forecasting accuracy gains; no independent metric (e.g., dependency recovery on synthetic data with known ground-truth paths, or a reliability score orthogonal to MSE/MAE) is reported to separate the effect of sparsity from generic regularization.
[Proposed Method] Method section: the precise definition of the 'strict communication budget' and the routing mechanism (e.g., how critical paths are selected at each time step or state) is not formalized with equations or pseudocode that would allow reproduction or analysis of whether the selection is demonstrably more robust than dense baselines under the paper's own noise assumptions.

minor comments (1)

[Abstract] Abstract: the phrase 'fewer yet more reliable dependencies' is repeated without a parenthetical clarification of how reliability is quantified beyond accuracy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below, agreeing that clarifications and additions will strengthen the manuscript.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental section (and associated tables/figures): the headline claim that MS-FLOW produces 'more reliable' dependencies rests solely on forecasting accuracy gains; no independent metric (e.g., dependency recovery on synthetic data with known ground-truth paths, or a reliability score orthogonal to MSE/MAE) is reported to separate the effect of sparsity from generic regularization.

Authors: We acknowledge that the primary support for claiming more reliable dependencies comes from the observed forecasting accuracy improvements across the 12 benchmarks. While these gains occur in settings with noisy, state-dependent dependencies—where dense models are prone to spurious correlations—we agree that an independent metric would more directly isolate the benefits of the sparse routing. In the revised manuscript, we will add synthetic experiments with known ground-truth dependency structures to quantify path recovery accuracy separately from MSE/MAE. revision: yes
Referee: [Proposed Method] Method section: the precise definition of the 'strict communication budget' and the routing mechanism (e.g., how critical paths are selected at each time step or state) is not formalized with equations or pseudocode that would allow reproduction or analysis of whether the selection is demonstrably more robust than dense baselines under the paper's own noise assumptions.

Authors: We thank the referee for highlighting this gap. The method section currently presents the sparse-bottleneck routing conceptually. We will revise it to include explicit equations defining the communication budget constraint, the per-timestep path selection criterion, and pseudocode for the overall procedure. This will facilitate reproduction and enable direct comparison of robustness under the noise models discussed in the paper. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical benchmarks, not self-referential derivations

full rationale

The paper introduces MS-FLOW as an architectural proposal motivated by observational remarks on state-dependent noise in real data. No mathematical derivation chain is presented that reduces a claimed result (such as 'more reliable dependencies') to a fitted parameter, self-citation, or input by construction. The abstract and available text contain no equations equating predictions to inputs, no uniqueness theorems imported from prior author work, and no ansatz smuggled via citation. Central claims are evidenced by forecasting accuracy on 12 external benchmarks, which are independently falsifiable and not forced by the model's internal definitions. This is the expected non-finding for an empirical architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that real multivariate dependencies are frequently state-dependent and noisy and that limiting information flow will suppress spurious correlations without discarding critical signals.

axioms (1)

domain assumption Dependencies in real data are often state-dependent and noisy
Explicitly stated as the observation motivating the sparse design.

pith-pipeline@v0.9.0 · 5481 in / 1134 out tokens · 47611 ms · 2026-05-12T02:29:05.072806+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MS-FLOW replaces fully connected communication with selective sparse routing, retaining only a few critical dependency paths and injecting cross-variable signals under a strict communication budget
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat embedding and J-positivity unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

impose an appropriate capacity constraint on cross-variable information flow, forcing the model to forget redundant connections

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

120 extracted references · 120 canonical work pages · 8 internal anchors

[1]

NeurIPS , year =

DBLoss: Decomposition-based Loss Function for Time Series Forecasting , author =. NeurIPS , year =

work page
[2]

arXiv preprint arXiv:2509.14933 , year=

DAG: A Dual Correlation Network for Time Series Forecasting with Exogenous Variables , author=. arXiv preprint arXiv:2509.14933 , year=

work page internal anchor Pith review arXiv
[3]

AAAI , year=

Rethinking Irregular Time Series Forecasting: A Simple yet Effective Baseline , author=. AAAI , year=

work page
[4]

arXiv preprint arXiv:2502.10721 , year=

A Comprehensive Survey of Deep Learning for Multivariate Time Series Forecasting: A Channel Strategy Perspective , author=. arXiv preprint arXiv:2502.10721 , year=

work page arXiv
[5]

Jensen and Bin Yang , title =

Xiangfei Qiu and Xiuwen Li and Ruiyang Pang and Zhicheng Pan and Xingjian Wu and Liu Yang and Jilin Hu and Yang Shu and Xuesong Lu and Chengcheng Yang and Chenjuan Guo and Aoying Zhou and Christian S. Jensen and Bin Yang , title =. ICDE , year =

work page
[6]

Proceedings of the 32nd ACM international conference on information and knowledge management , pages=

Famc-net: Frequency domain parity correction attention and multi-scale dilated convolution for time series forecasting , author=. Proceedings of the 32nd ACM international conference on information and knowledge management , pages=

work page
[7]

Journal of physics: conference series , volume=

Performance analysis of various activation functions in artificial neural networks , author=. Journal of physics: conference series , volume=. 2019 , organization=

work page 2019
[8]

Towards Data Sci , volume=

Activation functions in neural networks , author=. Towards Data Sci , volume=

work page
[9]

Ieee Access , volume=

A prediction approach for stock market volatility based on time series data , author=. Ieee Access , volume=. 2019 , publisher=

work page 2019
[10]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25 , volume=

Conditional information bottleneck-based multivariate time series forecasting , author=. Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25 , volume=

work page
[11]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

A time series is worth 64 words: Long-term forecasting with transformers , author=. arXiv preprint arXiv:2211.14730 , year=

work page internal anchor Pith review arXiv
[12]

arXiv preprint arXiv:2505.17431 , year=

Hyperimts: Hypergraph neural network for irregular multivariate time series forecasting , author=. arXiv preprint arXiv:2505.17431 , year=

work page arXiv
[13]

Searching for Activation Functions

Searching for activation functions , author=. arXiv preprint arXiv:1710.05941 , year=

work page internal anchor Pith review arXiv
[14]

Neurocomputing , volume=

Activation functions in deep learning: A comprehensive survey and benchmark , author=. Neurocomputing , volume=. 2022 , publisher=

work page 2022
[15]

Evolutionary Intelligence , volume=

An analysis of weight initialization methods in connection with different activation functions for feedforward neural networks , author=. Evolutionary Intelligence , volume=. 2024 , publisher=

work page 2024
[16]

Proceedings of the ACM Web Conference 2026 , pages=

Time-TK: A Multi-Offset Temporal Interaction Framework Combining Transformer and Kolmogorov-Arnold Networks for Time Series Forecasting , author=. Proceedings of the ACM Web Conference 2026 , pages=

work page 2026
[17]

arXiv preprint arXiv:2509.19406 , year=

TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding , author=. arXiv preprint arXiv:2509.19406 , year=

work page arXiv
[18]

Knowledge-Based Systems , pages=

FSMamba: A Dual-Expert Architecture with Fast Global Attention and Local-Enhanced State-space Mamba for Time Series Forecasting , author=. Knowledge-Based Systems , pages=. 2025 , publisher=

work page 2025
[19]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Chattime: A unified multimodal time series foundation model bridging numerical and textual data , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[20]

International Encyclopedia of Statistical Science , pages=

Nonlinear time series analysis , author=. International Encyclopedia of Statistical Science , pages=. 2025 , publisher=

work page 2025
[21]

Multivariate Behavioral Research , volume=

Non-stationarity in time-series analysis: Modeling stochastic and deterministic trends , author=. Multivariate Behavioral Research , volume=. 2025 , publisher=

work page 2025
[22]

arXiv preprint arXiv:2010.07359 , year=

Effects of the nonlinearity in activation functions on the performance of deep learning models , author=. arXiv preprint arXiv:2010.07359 , year=

work page arXiv 2010
[23]

international conference on machine learning , pages=

On the expressive power of deep neural networks , author=. international conference on machine learning , pages=. 2017 , organization=

work page 2017
[24]

IEEE Access , year=

Activation functions in non-negative neural networks , author=. IEEE Access , year=

work page
[25]

IEEE Internet of Things Journal , year=

An Interactive Attention Mechanism Network Integrating the C 1 Activation Function for Time Series Forecasting , author=. IEEE Internet of Things Journal , year=

work page
[26]

Neural processing letters , volume=

An analysis of activation function saturation in particle swarm optimization trained neural networks , author=. Neural processing letters , volume=. 2020 , publisher=

work page 2020
[27]

2025 , school=

Time series forecasting using neural networks without recurrent connections , author=. 2025 , school=

work page 2025
[28]

Procedia computer science , volume=

Research on convolutional neural network based on improved Relu piecewise activation function , author=. Procedia computer science , volume=. 2018 , publisher=

work page 2018
[29]

IEEE Access , year=

P-GELU: A Novel Activation Function to Optimize Whisper for Darija Speech Translation , author=. IEEE Access , year=

work page
[30]

High-Fidelity Photovoltaic Power Forecasting Using a Skip-Fusion DNN with GELU Activation and AdamW Optimization , author=

work page
[31]

Medical Engineering & Physics , volume=

Classification of acute myeloid leukemia by pre-trained deep neural networks: A comparison with different activation functions , author=. Medical Engineering & Physics , volume=. 2025 , publisher=

work page 2025
[32]

Nonlinear Dynamics , volume=

Grid multi-scroll attractors in cellular neural network with a new activation function and pulse current stimulation , author=. Nonlinear Dynamics , volume=. 2025 , publisher=

work page 2025
[33]

Computational Statistics , volume=

Deep learning-based survival analysis with copula-based activation functions for multivariate response prediction , author=. Computational Statistics , volume=. 2025 , publisher=

work page 2025
[34]

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , year=

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks , author=. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , year=

work page
[35]

Jensen , title =

Zezhi Shao and Zhao Zhang and Wei Wei and Fei Wang and Yongjun Xu and Xin Cao and Christian S. Jensen , title =. Proc

work page
[36]

International Conference on Learning Representations , year=

Discrete Graph Structure Learning for Forecasting Multiple Time Series , author=. International Conference on Learning Representations , year=

work page
[37]

Advances in neural information processing systems , volume=

Adaptive graph convolutional recurrent network for traffic forecasting , author=. Advances in neural information processing systems , volume=

work page
[38]

Forty-second International Conference on Machine Learning , year=

Temporal Query Network for Efficient Multivariate Time Series Forecasting , author=. Forty-second International Conference on Machine Learning , year=

work page
[39]

Computational Visual Media , volume=

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network , author=. Computational Visual Media , volume=. 2024 , publisher=

work page 2024
[40]

Computer Aided Geometric Design , volume=

Computing nodes for plane data points by constructing cubic polynomial with constraints , author=. Computer Aided Geometric Design , volume=. 2024 , publisher=

work page 2024
[41]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Activate or not: Learning customized activation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[42]

International conference on machine learning , pages=

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting , author=. International conference on machine learning , pages=. 2022 , organization=

work page 2022
[43]

Knowledge-Based Systems , pages=

CAWformer: A cross variable attention with discrete wavelet denoising for multivariate time series forecasting , author=. Knowledge-Based Systems , pages=. 2025 , publisher=

work page 2025
[44]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[45]

Hao Wang and Lichen Pan and Yuan Shen and Zhichao Chen and Degui Yang and Yifei Yang and Sen Zhang and Xinggao Liu and Haoxuan Li and Dacheng Tao , booktitle=. Fre. 2025 , url=

work page 2025
[46]

BMC medical informatics and decision making , volume=

The role of explainable artificial intelligence in disease prediction: a systematic literature review and future research directions , author=. BMC medical informatics and decision making , volume=. 2025 , publisher=

work page 2025
[47]

The 41st international ACM SIGIR conference on research & development in information retrieval , pages=

Modeling long-and short-term temporal patterns with deep neural networks , author=. The 41st international ACM SIGIR conference on research & development in information retrieval , pages=

work page
[48]

TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting , author=. arXiv preprint arXiv:2604.12648 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[49]

Advances in Neural Information Processing Systems , volume=

Mofo: Empowering long-term time series forecasting with periodic pattern modeling , author=. Advances in Neural Information Processing Systems , volume=

work page
[50]

Forty-second International Conference on Machine Learning , year=

Hi-patch: Hierarchical patch gnn for irregular multivariate time series , author=. Forty-second International Conference on Machine Learning , year=

work page
[51]

International conference on learning representations , year=

Reversible instance normalization for accurate time-series forecasting against distribution shift , author=. International conference on learning representations , year=

work page
[52]

Songtao Huang and Zhen Zhao and Can Li and LEI BAI , booktitle=. Time. 2025 , url=

work page 2025
[53]

2007 , issn =

Environmental time series analysis and forecasting with the Captain toolbox , journal =. 2007 , issn =. doi:10.1016/j.envsoft.2006.03.002 , url=

work page doi:10.1016/j.envsoft.2006.03.002 2007
[54]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Decomposed Spatio-Temporal Mamba for Long-Term Traffic Prediction , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[55]

Two-Stage Framework for Seasonal Time Series Forecasting , year=

Xu, Qingyang and Wen, Qingsong and Sun, Liang , booktitle=. Two-Stage Framework for Seasonal Time Series Forecasting , year=

work page
[56]

2024 IEEE International Conference on Multimedia and Expo (ICME) , pages=

Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition , author=. 2024 IEEE International Conference on Multimedia and Expo (ICME) , pages=. 2024 , organization=

work page 2024
[57]

The Twelfth International Conference on Learning Representations , year=

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting , author=. The Twelfth International Conference on Learning Representations , year=

work page
[58]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,

VCformer: Variable Correlation Transformer with Inherent Lagged Correlation for Multivariate Time Series Forecasting , author =. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,. 2024 , month =. doi:10.24963/ijcai.2024/590 , url =

work page doi:10.24963/ijcai.2024/590 2024
[59]

Rectifier nonlinearities improve neural network acoustic models , author=. Proc. icml , volume=. 2013 , publisher=

work page 2013
[60]

Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining , pages=

Fredformer: Frequency debiased transformer for time series forecasting , author=. Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining , pages=

work page
[61]

Advances in Neural Information Processing Systems , volume=

Filternet: Harnessing frequency filters for time series forecasting , author=. Advances in Neural Information Processing Systems , volume=

work page
[62]

arXiv preprint arXiv:2404.14757 , year=

Sst: Multi-scale hybrid mamba-transformer experts for long-short range time series forecasting , author=. arXiv preprint arXiv:2404.14757 , year=

work page arXiv
[63]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Filterts: Comprehensive frequency filtering for multivariate time series forecasting , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[64]

Proceedings of the AAAI conference on artificial intelligence , volume=

Are transformers effective for time series forecasting? , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[65]

Proceedings of the AAAI conference on artificial intelligence , volume=

Informer: Beyond efficient transformer for long sequence time-series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[66]

Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Rangapuram, Danielle C

Enhancing foundation models for time series forecasting via wavelet-based tokenization , author=. arXiv preprint arXiv:2412.05244 , year=

work page arXiv
[67]

arXiv preprint arXiv:2502.12920 , year=

Lightweight online adaption for time series foundation model forecasts , author=. arXiv preprint arXiv:2502.12920 , year=

work page arXiv
[68]

Advances in Neural Information Processing Systems , volume=

Timexer: Empowering transformers for time series forecasting with exogenous variables , author=. Advances in Neural Information Processing Systems , volume=

work page
[69]

The eleventh international conference on learning representations , year=

Timesnet: Temporal 2d-variation modeling for general time series analysis , author=. The eleventh international conference on learning representations , year=

work page
[70]

arXiv preprint arXiv:2505.20774 , year=

Timepro: Efficient multivariate long-term time series forecasting with variable-and time-aware hyper-state , author=. arXiv preprint arXiv:2505.20774 , year=

work page arXiv
[71]

IEEE transactions on antennas and propagation , volume=

On mm-wave multipath clustering and channel modeling , author=. IEEE transactions on antennas and propagation , volume=. 2013 , publisher=

work page 2013
[72]

Advances in Neural Information Processing Systems , volume=

From similarity to superiority: Channel clustering for time series forecasting , author=. Advances in Neural Information Processing Systems , volume=

work page
[73]

Forty-first International Conference on Machine Learning , year=

Irregular multivariate time series forecasting: A transformable patching graph neural networks approach , author=. Forty-first International Conference on Machine Learning , year=

work page
[74]

arXiv preprint arXiv:2501.13041 , year=

TimeFilter: Patch-specific spatial-temporal graph filtration for time series forecasting , author=. arXiv preprint arXiv:2501.13041 , year=

work page arXiv
[75]

International Conference on Learning Representations , year=

Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting , author=. International Conference on Learning Representations , year=

work page
[76]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Duet: Dual clustering enhanced multivariate time series forecasting , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1 , pages=

work page
[77]

The eleventh international conference on learning representations , year=

Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting , author=. The eleventh international conference on learning representations , year=

work page
[78]

Advances in neural information processing systems , volume=

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting , author=. Advances in neural information processing systems , volume=

work page
[79]

arXiv preprint arXiv:2405.14616 , year=

Timemixer: Decomposable multiscale mixing for time series forecasting , author=. arXiv preprint arXiv:2405.14616 , year=

work page arXiv
[80]

Advances in Neural Information Processing Systems , volume=

Softs: Efficient multivariate time series forecasting with series-core fusion , author=. Advances in Neural Information Processing Systems , volume=

work page

Showing first 80 references.