DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis

Haohao Qu; Qing Li; Rui An; Wenqi Fan; Xuequn Shang

arxiv: 2601.05527 · v2 · pith:GIZQBMO4new · submitted 2026-01-09 · 💻 cs.LG · cs.AI

DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis

Rui An , Haohao Qu , Wenqi Fan , Xuequn Shang , Qing Li This is my paper

Pith reviewed 2026-05-21 16:17 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Multivariate Time SeriesMamba ArchitectureDual-Path ModelingDelay-Aware AttentionTime Series ForecastingAnomaly DetectionData ImputationSeries Classification

0 comments

The pith

DeMa splits multivariate time series into separate temporal and variate paths using modified Mamba modules to model delays and interactions at linear cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeMa as a backbone that decomposes each multivariate series into one path for long-range dynamics inside each individual variable and a second path for cross-variable dependencies that include explicit time lags. It keeps the linear scaling of Mamba while adding two specialized modules: Mamba-SSD for independent series processing and Mamba-DALA for delay-aware linear attention across variables. If the separation works, the model should handle long sequences in forecasting, imputation, anomaly detection, and classification without the quadratic cost of attention-based methods. Experiments across five tasks are presented as evidence that both accuracy and speed improve over prior approaches.

Core claim

DeMa preserves Mamba's linear-complexity advantage while substantially improving its suitability for MTS settings by decomposing the input into intra-series temporal dynamics captured by a Mamba-SSD module and inter-series interactions captured by a Mamba-DALA module that integrates delay-aware linear attention.

What carries the argument

Dual-path decomposition consisting of a temporal path (Mamba-SSD for series-independent long-range dynamics) and a variate path (Mamba-DALA for delay-aware cross-variate dependencies).

If this is right

DeMa reaches state-of-the-art accuracy on long-term and short-term forecasting while using less compute.
The same architecture improves data imputation, anomaly detection, and series classification over prior models.
Linear complexity is retained, so the method scales to longer sequences than quadratic attention allows.
Series-independent parallel computation in the temporal path reduces overhead compared with fully coupled models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same split could be tested on other linear-time sequence models to see whether the delay-aware component transfers beyond Mamba.
If the dual paths reduce memory usage in practice, the approach may enable deployment on edge devices for real-time multivariate monitoring.
Extending the delay modeling to non-stationary or irregularly sampled series would be a direct next measurement of robustness.

Load-bearing premise

The claim that explicitly separating intra-series dynamics from inter-series interactions with added delay modeling fully resolves the three listed limitations of vanilla Mamba without leaving modeling gaps.

What would settle it

A controlled comparison on the same five benchmark tasks where a plain Mamba or a standard Transformer reaches equal or better accuracy and wall-clock time would falsify the necessity of the dual-path design.

Figures

Figures reproduced from arXiv: 2601.05527 by Haohao Qu, Qing Li, Rui An, Wenqi Fan, Xuequn Shang.

**Figure 1.** Figure 1: Dependency modeling strategies and computational complexity of representative MTS architectures. (a) Tokenization and induced dependency patterns (variate-mixing, variateindependent, variate-dependent). (b) Complexity comparison of representative models. Here, T is the lookback length, N the number of variates, L the token length, and d the embedding dimension, in typical long-horizon settings, T > L ≫ N,… view at source ↗

**Figure 2.** Figure 2: The overall framework of DeMa. The proposed DeMa (Left) comprises three key com [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Performance comparison on classification and anomaly detection tasks. Results are averaged [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: Model performance comparison (left) and efficiency comparison (right). DeMa achieves [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Efficiency analysis of GPU memory and running time in a long-term lookback-window [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Sensitivity of Fusion Weights α and β Across Tasks. Across all tasks, DeMa is most reliable when both paths remain active (i.e., neither α nor β is overly small), confirming that temporal modeling and cross-variate interaction are complementary rather than substitutable. Forecasting is relatively less sensitive to α and β ( [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

read the original abstract

Accurate and efficient multivariate time series (MTS) analysis is increasingly critical for a wide range of intelligent applications. Within this realm, Transformers have emerged as the predominant architecture due to their strong ability to capture pairwise dependencies. However, Transformer-based models suffer from quadratic computational complexity and high memory overhead, limiting their scalability and practical deployment in long-term and large-scale MTS modeling. Recently, Mamba has emerged as a promising linear-time alternative with high expressiveness. Nevertheless, directly applying vanilla Mamba to MTS remains suboptimal due to three key limitations: (i) the lack of explicit cross-variate modeling, (ii) difficulty in disentangling the entangled intra-series temporal dynamics and inter-series interactions, and (iii) insufficient modeling of latent time-lag interaction effects. These issues constrain its effectiveness across diverse MTS tasks. To address these challenges, we propose DeMa, a dual-path delay-aware Mamba backbone. DeMa preserves Mamba's linear-complexity advantage while substantially improving its suitability for MTS settings. Specifically, DeMa introduces three key innovations: (i) it decomposes the MTS into intra-series temporal dynamics and inter-series interactions; (ii) it develops a temporal path with a Mamba-SSD module to capture long-range dynamics within each individual series, enabling series-independent, parallel computation; and (iii) it designs a variate path with a Mamba-DALA module that integrates delay-aware linear attention to model cross-variate dependencies. Extensive experiments on five representative tasks, long- and short-term forecasting, data imputation, anomaly detection, and series classification, demonstrate that DeMa achieves state-of-the-art performance while delivering remarkable computational efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeMa adds a dual-path Mamba split for MTS that separates intra-series dynamics from cross-variate lags, but the delay mechanism's linearity needs direct verification from the equations.

read the letter

The main thing to know is that DeMa decomposes multivariate time series into a temporal path using Mamba-SSD for series-independent long-range modeling and a variate path using Mamba-DALA that adds delay-aware linear attention for cross-variate and time-lag effects, all while keeping overall linear complexity. This directly targets the three limitations the authors list for vanilla Mamba on MTS data: missing explicit cross-variate terms, entangled intra- and inter-series signals, and weak handling of latent lags. The design choice to enable parallel computation on the temporal side is practical for scaling to longer sequences. The reported results across long- and short-term forecasting, imputation, anomaly detection, and classification, plus efficiency numbers, give the work some grounding if the baselines and ablations are standard and the gains are consistent rather than marginal. Credit is due for shipping a concrete architecture that extends recent Mamba work without obvious circularity or self-referential fitting. The soft spot sits in the Mamba-DALA module. The central claim rests on the delay-aware linear attention actually capturing variable time-lag interactions without reintroducing quadratic costs or leaving modeling gaps on datasets with complex or long lags. If the implementation uses a restricted lag set or falls back to dense operations for arbitrary delays, the promised disentanglement would be incomplete and the efficiency edge could shrink. I would want to see the exact state-transition modifications or kernel definitions to judge this. This paper is for practitioners and researchers building efficient models for long multivariate forecasting or monitoring tasks who already follow the Mamba line of work. A reader looking for drop-in linear alternatives to Transformers on public MTS benchmarks would get the most from the design and results. It deserves a serious referee because the core decomposition is a reasonable, testable extension and the evaluation scope is broad enough to warrant detailed review rather than desk rejection.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes DeMa, a dual-path delay-aware Mamba backbone for multivariate time series analysis. It decomposes MTS data into intra-series temporal dynamics (via a Mamba-SSD module in the temporal path) and inter-series interactions (via a Mamba-DALA module with delay-aware linear attention in the variate path). The work claims this addresses three limitations of vanilla Mamba—lack of explicit cross-variate modeling, entangled dynamics, and insufficient latent time-lag effects—while retaining linear complexity, outperforming Transformer-based models. Extensive experiments across five tasks (long- and short-term forecasting, data imputation, anomaly detection, and series classification) are reported to demonstrate state-of-the-art performance and computational efficiency.

Significance. If the central claims hold, the contribution would be significant for efficient MTS modeling. The dual-path design and explicit handling of time-lag effects via linear mechanisms could provide a scalable alternative to quadratic Transformer architectures, with potential impact on long-sequence applications. The emphasis on disentangling intra- and inter-series components while preserving Mamba's efficiency is a clear strength, particularly if the delay-aware component is shown to be both effective and complexity-preserving.

major comments (2)

[Abstract] Abstract: The central claim that Mamba-DALA 'integrates delay-aware linear attention' to model cross-variate dependencies and latent time-lag interaction effects lacks any supporting equation, state-transition modification, kernel definition, or pseudocode. This is load-bearing because the efficiency advantage and disentanglement rest on the mechanism preserving linear complexity for arbitrary or variable lags; without the concrete formulation it is impossible to rule out fallback to dense attention or restriction to a fixed small lag set.
[§3 (Architecture description)] The description of the variate path (Mamba-DALA) does not specify how delays are injected (e.g., via modified selective state transitions, lag-specific kernels, or adjusted scanning) while keeping overall complexity linear. If the implementation either reintroduces quadratic terms for long lags or uses a small fixed lag set, the claimed modeling of entangled inter-series dynamics would be incomplete, directly undermining the SOTA and efficiency results on the five tasks.

minor comments (2)

[Abstract] The abstract states 'extensive experiments' demonstrate SOTA results but does not reference specific tables, metrics (e.g., MSE, MAE), baselines, or statistical tests; these details should be summarized with pointers to the relevant result sections or tables.
Notation for the two paths (temporal vs. variate) and the modules (Mamba-SSD, Mamba-DALA) should be introduced with consistent symbols or diagrams early in the architecture section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address the concerns about the clarity of the Mamba-DALA formulation by adding explicit equations, a complexity analysis, and pseudocode.

read point-by-point responses

Referee: [Abstract] The central claim that Mamba-DALA 'integrates delay-aware linear attention' to model cross-variate dependencies and latent time-lag interaction effects lacks any supporting equation, state-transition modification, kernel definition, or pseudocode. This is load-bearing because the efficiency advantage and disentanglement rest on the mechanism preserving linear complexity for arbitrary or variable lags; without the concrete formulation it is impossible to rule out fallback to dense attention or restriction to a fixed small lag set.

Authors: We thank the referee for this observation. The abstract is intentionally high-level; the concrete formulation appears in Section 3.2, where delay-aware linear attention is realized by modifying the selective state transitions with lag-specific parameters inside the linear kernel. To improve clarity we have inserted the defining equations for the modified state update and the lag kernel, together with a short complexity argument showing the scan remains strictly linear in sequence length for arbitrary lags. Pseudocode is now also provided in the appendix. revision: yes
Referee: [§3 (Architecture description)] The description of the variate path (Mamba-DALA) does not specify how delays are injected (e.g., via modified selective state transitions, lag-specific kernels, or adjusted scanning) while keeping overall complexity linear. If the implementation either reintroduces quadratic terms for long lags or uses a small fixed lag set, the claimed modeling of entangled inter-series dynamics would be incomplete, directly undermining the SOTA and efficiency results on the five tasks.

Authors: We appreciate the referee drawing attention to this potential ambiguity. In the original text, delays are injected by lag-specific kernels that adjust the selective parameters of the Mamba scan; the overall procedure stays linear because the attention is computed via a single selective state-space pass rather than pairwise operations. Nevertheless, we agree the exposition can be tightened. The revised Section 3 now contains an explicit step-by-step derivation of the delay injection, the corresponding kernel definition, and a formal complexity proof confirming O(N) scaling even for variable or long lags. These additions directly support the reported efficiency and performance claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity: DeMa defines new modules and validates empirically on external tasks

full rationale

The paper proposes DeMa as a dual-path architecture that decomposes multivariate time series into intra-series temporal dynamics (via Mamba-SSD) and inter-series interactions (via Mamba-DALA with delay-aware linear attention). These components are introduced as explicit innovations to address stated limitations of vanilla Mamba, with the overall model evaluated through experiments on five standard external tasks. No equations, predictions, or central claims reduce by construction to fitted parameters, self-citations, or renamed inputs; the derivation remains a sequence of architectural definitions followed by independent empirical results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Based solely on the abstract, the central claim rests on the introduction of two new modules whose internal mechanisms and any associated parameters are not detailed; no standard mathematical axioms or external benchmarks are invoked in the summary.

invented entities (2)

Mamba-SSD module no independent evidence
purpose: Capture long-range dynamics within each individual series for series-independent parallel computation
New component introduced in the temporal path of DeMa.
Mamba-DALA module no independent evidence
purpose: Integrate delay-aware linear attention to model cross-variate dependencies and time-lag effects
New component introduced in the variate path of DeMa.

pith-pipeline@v0.9.0 · 5843 in / 1227 out tokens · 66094 ms · 2026-05-21T16:17:11.239811+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArrowOfTime.lean arrow_from_z unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DeMa introduces three key innovations: (i) it decomposes the MTS into intra-series temporal dynamics and inter-series interactions; (ii) temporal path with Mamba-SSD ... (iii) variate path with Mamba-DALA that integrates delay-aware linear attention
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Mamba-DALA ... global correlation delay ... token-level relative delay ... RoPE

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 7 internal anchors

[1]

Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics, pages 1–34, 2025

Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics, pages 1–34, 2025

work page 2025
[2]

Guangyu Huo, Yong Zhang, Boyue Wang, Junbin Gao, Yongli Hu, and Baocai Yin. Hierarchical spatio– temporal graph convolutional networks and transformer network for traffic flow forecasting.IEEE Transactions on Intelligent Transportation Systems, 24(4):3855–3867, 2023

work page 2023
[3]

Damba-st: Domain-adaptive mamba for efficient urban spatio-temporal prediction.arXiv preprint arXiv:2506.18939, 2025

Rui An, Yifeng Zhang, Ziran Liang, Wenqi Fan, Yuxuan Liang, Xuequn Shang, and Qing Li. Damba-st: Domain-adaptive mamba for efficient urban spatio-temporal prediction.arXiv preprint arXiv:2506.18939, 2025

work page arXiv 2025
[4]

Lara: A light and anti-overfitting retraining approach for unsupervised time series anomaly detection

Feiyi Chen, Zhen Qin, Mengchu Zhou, Yingying Zhang, Shuiguang Deng, Lunting Fan, Guansong Pang, and Qingsong Wen. Lara: A light and anti-overfitting retraining approach for unsupervised time series anomaly detection. InProceedings of the ACM on Web Conference 2024, pages 4138–4149, 2024

work page 2024
[5]

A review on outlier/anomaly detection in time series data.ACM computing surveys (CSUR), 54(3):1–33, 2021

Ane Bl´azquez-Garc´ıa, Angel Conde, Usue Mori, and Jose A Lozano. A review on outlier/anomaly detection in time series data.ACM computing surveys (CSUR), 54(3):1–33, 2021

work page 2021
[6]

Imaging and fusing time series for wearable sensor-based human activity recognition.Information Fusion, 53:80–87, 2020

Zhen Qin, Yibo Zhang, Shuyu Meng, Zhiguang Qin, and Kim-Kwang Raymond Choo. Imaging and fusing time series for wearable sensor-based human activity recognition.Information Fusion, 53:80–87, 2020

work page 2020
[7]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017

work page 2017
[8]

A time series is worth 64 words: Long-term forecasting with transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[9]

Timer: Generative pre-trained transformers are large time series models

Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Timer: Generative pre-trained transformers are large time series models. InForty-first International Conference on Machine Learning, 2024

work page 2024
[10]

Unified training of universal time series forecasting transformers

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. InForty-first International Conference on Machine Learning, 2024

work page 2024
[11]

itrans- former: Inverted transformers are effective for time series forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itrans- former: Inverted transformers are effective for time series forecasting. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[12]

Timesnet: Temporal 2d-variation modeling for general time series analysis

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[13]

itfkan: Interpretable time series forecasting with kolmogorov-arnold network.arXiv preprint arXiv:2504.16432, 2025

Ziran Liang, Rui An, Wenqi Fan, Yanghui Rao, and Yuxuan Liang. itfkan: Interpretable time series forecasting with kolmogorov-arnold network.arXiv preprint arXiv:2504.16432, 2025

work page arXiv 2025
[14]

Recurrent neural networks for time series forecasting: Current status and future directions.International Journal of Forecasting, 37(1):388–427, 2021

Hansika Hewamalage, Christoph Bergmeir, and Kasun Bandara. Recurrent neural networks for time series forecasting: Current status and future directions.International Journal of Forecasting, 37(1):388–427, 2021. 22

work page 2021
[15]

Recurrent neural networks for time series classification.Neurocomputing, 50:223–235, 2003

Michael H¨usken and Peter Stagge. Recurrent neural networks for time series classification.Neurocomputing, 50:223–235, 2003

work page 2003
[16]

Scinet: Time series modeling and forecasting with sample convolution and interaction.Advances in Neural Information Processing Systems, 35:5816–5828, 2022

Minhao Liu, Ailing Zeng, Muxi Chen, Zhijian Xu, Qiuxia Lai, Lingna Ma, and Qiang Xu. Scinet: Time series modeling and forecasting with sample convolution and interaction.Advances in Neural Information Processing Systems, 35:5816–5828, 2022

work page 2022
[17]

Moderntcn: A modern pure convolution structure for general time series analysis

Donghao Luo and Xue Wang. Moderntcn: A modern pure convolution structure for general time series analysis. InICLR, 2024

work page 2024
[18]

Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

work page 2023
[19]

Timemixer: Decomposable multiscale mixing for time series forecasting

Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y Zhang, and JUN ZHOU. Timemixer: Decomposable multiscale mixing for time series forecasting. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[20]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021

work page 2021
[21]

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning, pages 27268–27286. PMLR, 2022

work page 2022
[22]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

work page 2021
[23]

Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting

Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. InInternational conference on learning representations, 2021

work page 2021
[24]

Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. InThe eleventh international conference on learning representations, 2023

work page 2023
[25]

Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting

Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 459–469, 2023

work page 2023
[26]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Transformers are ssms: generalized models and efficient algorithms through structured state space duality

Tri Dao and Albert Gu. Transformers are ssms: generalized models and efficient algorithms through structured state space duality. InProceedings of the 41st International Conference on Machine Learning, pages 10041–10071, 2024

work page 2024
[28]

A Survey of Mamba

Haohao Qu, Liangbo Ning, Rui An, Wenqi Fan, Tyler Derr, Xin Xu, and Qing Li. A survey of mamba. arXiv preprint arXiv:2408.01129, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

Jamba: A Hybrid Transformer-Mamba Language Model

Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, et al. Jamba: A hybrid transformer-mamba language model.arXiv preprint arXiv:2403.19887, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model.arXiv preprint arXiv:2401.09417, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Caduceus: Bi-directional equivariant long-range dna sequence modeling

Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and V olodymyr Kuleshov. Caduceus: Bi-directional equivariant long-range dna sequence modeling. InInternational Conference on Machine Learning, pages 43632–43648. PMLR, 2024

work page 2024
[32]

Ssd4rec: A structured state space duality model for efficient sequential recommendation.arXiv preprint arXiv:2409.01192, 2024

Haohao Qu, Yifeng Zhang, Liangbo Ning, Wenqi Fan, and Qing Li. Ssd4rec: A structured state space duality model for efficient sequential recommendation.arXiv preprint arXiv:2409.01192, 2024

work page arXiv 2024
[33]

Deep Time Series Models: A Comprehensive Survey and Benchmark

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, and Jianmin Wang. Deep time series models: A comprehensive survey and benchmark.arXiv preprint arXiv:2407.13278, 2024. 23

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

Rethinking channel dependence for multivariate time series forecasting: Learning from leading indicators

Lifan Zhao and Yanyan Shen. Rethinking channel dependence for multivariate time series forecasting: Learning from leading indicators. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[35]

Unveiling delay effects in traffic forecasting: A perspective from spatial-temporal delay differential equations

Qingqing Long, Zheng Fang, Chen Fang, Chong Chen, Pengfei Wang, and Yuanchun Zhou. Unveiling delay effects in traffic forecasting: A perspective from spatial-temporal delay differential equations. In Proceedings of the ACM on Web Conference 2024, pages 1035–1044, 2024

work page 2024
[36]

Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction

Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and Jingyuan Wang. Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 4365–4373, 2023

work page 2023
[37]

Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher R ´e. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

work page 2020
[38]

Parallel prefix sum (scan) with cuda.GPU gems, 3(39):851–876, 2007

Mark Harris, Shubhabrata Sengupta, and John D Owens. Parallel prefix sum (scan) with cuda.GPU gems, 3(39):851–876, 2007

work page 2007
[39]

Frequency-domain mlps are more effective learners in time series forecasting

Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Ning An, Defu Lian, Longbing Cao, and Zhendong Niu. Frequency-domain mlps are more effective learners in time series forecasting. Advances in Neural Information Processing Systems, 36:76656–76679, 2023

work page 2023
[40]

Koopa: Learning non-stationary time series dynamics with koopman predictors.Advances in neural information processing systems, 36:12271–12290, 2023

Yong Liu, Chenyu Li, Jianmin Wang, and Mingsheng Long. Koopa: Learning non-stationary time series dynamics with koopman predictors.Advances in neural information processing systems, 36:12271–12290, 2023

work page 2023
[41]

Ts2vec: Towards universal representation of time series

Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series. InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 8980–8987, 2022

work page 2022
[42]

How to train your hippo: State space models with generalized orthogonal basis projections

Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, and Christopher R´e. How to train your hippo: State space models with generalized orthogonal basis projections.arXiv preprint arXiv:2206.12037, 2022

work page arXiv 2022
[43]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. InInternational Conference on Learning Representations, 2021

work page 2021
[44]

Demystify mamba in vision: A linear attention perspective.arXiv preprint arXiv:2405.16605, 2024

Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, and Gao Huang. Demystify mamba in vision: A linear attention perspective.arXiv preprint arXiv:2405.16605, 2024

work page arXiv 2024
[45]

Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

work page 2024
[46]

Time delay estimation by generalized cross correlation methods.IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2):280–285, 1984

Mordechai Azaria and David Hertz. Time delay estimation by generalized cross correlation methods.IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2):280–285, 1984

work page 1984
[47]

Flatten transformer: Vision transformer using focused linear attention

Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, and Gao Huang. Flatten transformer: Vision transformer using focused linear attention. InProceedings of the IEEE/CVF international conference on computer vision, pages 5961–5971, 2023

work page 2023
[48]

Trindade, ElectricityLoadDiagrams20112014, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C58C86 (2015)

Artur Trindade. ElectricityLoadDiagrams20112014. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86

work page doi:10.24432/c58c86 2015
[49]

Modeling long-and short-term temporal patterns with deep neural networks

Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018

work page 2018
[50]

Freeway performance measurement system: mining loop detector data.Transportation research record, 1748(1):96–102, 2001

Chao Chen, Karl Petty, Alexander Skabardonis, Pravin Varaiya, and Zhanfeng Jia. Freeway performance measurement system: mining loop detector data.Transportation research record, 1748(1):96–102, 2001

work page 2001
[51]

Robust anomaly detection for multivariate time series through stochastic recurrent neural network

Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2828–2837, 2019

work page 2019
[52]

Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding

Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Soderstrom. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 387–395, 2018. 24

work page 2018
[53]

Swat: A water treatment testbed for research and training on ics security

Aditya P Mathur and Nils Ole Tippenhauer. Swat: A water treatment testbed for research and training on ics security. InCySWater, 2016

work page 2016
[54]

Practical approach to asynchronous multivariate time series anomaly detection and localization.KDD, 2021

Ahmed Abdulaal, Zhuanghua Liu, and Tomer Lancewicki. Practical approach to asynchronous multivariate time series anomaly detection and localization.KDD, 2021

work page 2021
[56]

Affirm: Interactive mamba with adaptive fourier filters for long-term time series forecasting

Yuhan Wu, Xiyu Meng, Huajin Hu, Junru Zhang, Yabo Dong, and Dongming Lu. Affirm: Interactive mamba with adaptive fourier filters for long-term time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 21599–21607, 2025

work page 2025
[57]

Is mamba effective for time series forecasting?Neurocomputing, 619:129178, 2025

Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Daling Wang, and Yifei Zhang. Is mamba effective for time series forecasting?Neurocomputing, 619:129178, 2025

work page 2025
[59]

Simplified mamba with disentangled dependency encoding for long-term time series forecasting.arXiv preprint arXiv:2408.12068, 2024

Zixuan Weng, Jindong Han, Wenzhao Jiang, and Hao Liu. Simplified mamba with disentangled dependency encoding for long-term time series forecasting.arXiv preprint arXiv:2408.12068, 2024

work page arXiv 2024
[60]

Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping

Zhe Li, Shiyi Qi, Yiduo Li, and Zenglin Xu. Revisiting long-term time series forecasting: An investigation on linear mapping.arXiv preprint arXiv:2305.10721, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[61]

Adam: A method for stochastic optimization.(No Title), 2014

P Kingma Diederik. Adam: A method for stochastic optimization.(No Title), 2014

work page 2014
[62]

The UEA multivariate time series classification archive, 2018

Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. The uea multivariate time series classification archive, 2018.arXiv preprint arXiv:1811.00075, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[63]

Unitime: A language-empowered unified model for cross-domain time series forecasting

Xu Liu, Junfeng Hu, Yuan Li, Shizhe Diao, Yuxuan Liang, Bryan Hooi, and Roger Zimmermann. Unitime: A language-empowered unified model for cross-domain time series forecasting. InProceedings of the ACM on Web Conference 2024, pages 4095–4106, 2024

work page 2024
[64]

Reformer: The efficient transformer

Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer. InInternational Conference on Learning Representations, 2019

work page 2019
[65]

Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019

work page 2019
[66]

C-mamba: Channel correlation enhanced state space models for multivariate time series forecasting.arXiv preprint arXiv:2406.05316, 2024

Chaolv Zeng, Zhanyu Liu, Guanjie Zheng, and Linghe Kong. C-mamba: Channel correlation enhanced state space models for multivariate time series forecasting.arXiv preprint arXiv:2406.05316, 2024

work page arXiv 2024
[67]

Mambamixer: Efficient selective state space models with dual token and channel selection.arXiv preprint arXiv:2403.19888, 2024

Ali Behrouz, Michele Santacatterina, and Ramin Zabih. Mambamixer: Efficient selective state space models with dual token and channel selection.arXiv preprint arXiv:2403.19888, 2024

work page arXiv 2024
[68]

Mambats: improved selective state space models for long-term time series forecasting

Xiuding Cai, Yaoyao Zhu, Xueyao Wang, and Yu Yao. Mambats: Improved selective state space models for long-term time series forecasting.arXiv preprint arXiv:2405.16440, 2024. 25

work page arXiv 2024

[1] [1]

Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics, pages 1–34, 2025

Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics, pages 1–34, 2025

work page 2025

[2] [2]

Guangyu Huo, Yong Zhang, Boyue Wang, Junbin Gao, Yongli Hu, and Baocai Yin. Hierarchical spatio– temporal graph convolutional networks and transformer network for traffic flow forecasting.IEEE Transactions on Intelligent Transportation Systems, 24(4):3855–3867, 2023

work page 2023

[3] [3]

Damba-st: Domain-adaptive mamba for efficient urban spatio-temporal prediction.arXiv preprint arXiv:2506.18939, 2025

Rui An, Yifeng Zhang, Ziran Liang, Wenqi Fan, Yuxuan Liang, Xuequn Shang, and Qing Li. Damba-st: Domain-adaptive mamba for efficient urban spatio-temporal prediction.arXiv preprint arXiv:2506.18939, 2025

work page arXiv 2025

[4] [4]

Lara: A light and anti-overfitting retraining approach for unsupervised time series anomaly detection

Feiyi Chen, Zhen Qin, Mengchu Zhou, Yingying Zhang, Shuiguang Deng, Lunting Fan, Guansong Pang, and Qingsong Wen. Lara: A light and anti-overfitting retraining approach for unsupervised time series anomaly detection. InProceedings of the ACM on Web Conference 2024, pages 4138–4149, 2024

work page 2024

[5] [5]

A review on outlier/anomaly detection in time series data.ACM computing surveys (CSUR), 54(3):1–33, 2021

Ane Bl´azquez-Garc´ıa, Angel Conde, Usue Mori, and Jose A Lozano. A review on outlier/anomaly detection in time series data.ACM computing surveys (CSUR), 54(3):1–33, 2021

work page 2021

[6] [6]

Imaging and fusing time series for wearable sensor-based human activity recognition.Information Fusion, 53:80–87, 2020

Zhen Qin, Yibo Zhang, Shuyu Meng, Zhiguang Qin, and Kim-Kwang Raymond Choo. Imaging and fusing time series for wearable sensor-based human activity recognition.Information Fusion, 53:80–87, 2020

work page 2020

[7] [7]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017

work page 2017

[8] [8]

A time series is worth 64 words: Long-term forecasting with transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[9] [9]

Timer: Generative pre-trained transformers are large time series models

Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Timer: Generative pre-trained transformers are large time series models. InForty-first International Conference on Machine Learning, 2024

work page 2024

[10] [10]

Unified training of universal time series forecasting transformers

Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. InForty-first International Conference on Machine Learning, 2024

work page 2024

[11] [11]

itrans- former: Inverted transformers are effective for time series forecasting

Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itrans- former: Inverted transformers are effective for time series forecasting. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[12] [12]

Timesnet: Temporal 2d-variation modeling for general time series analysis

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[13] [13]

itfkan: Interpretable time series forecasting with kolmogorov-arnold network.arXiv preprint arXiv:2504.16432, 2025

Ziran Liang, Rui An, Wenqi Fan, Yanghui Rao, and Yuxuan Liang. itfkan: Interpretable time series forecasting with kolmogorov-arnold network.arXiv preprint arXiv:2504.16432, 2025

work page arXiv 2025

[14] [14]

Recurrent neural networks for time series forecasting: Current status and future directions.International Journal of Forecasting, 37(1):388–427, 2021

Hansika Hewamalage, Christoph Bergmeir, and Kasun Bandara. Recurrent neural networks for time series forecasting: Current status and future directions.International Journal of Forecasting, 37(1):388–427, 2021. 22

work page 2021

[15] [15]

Recurrent neural networks for time series classification.Neurocomputing, 50:223–235, 2003

Michael H¨usken and Peter Stagge. Recurrent neural networks for time series classification.Neurocomputing, 50:223–235, 2003

work page 2003

[16] [16]

Scinet: Time series modeling and forecasting with sample convolution and interaction.Advances in Neural Information Processing Systems, 35:5816–5828, 2022

Minhao Liu, Ailing Zeng, Muxi Chen, Zhijian Xu, Qiuxia Lai, Lingna Ma, and Qiang Xu. Scinet: Time series modeling and forecasting with sample convolution and interaction.Advances in Neural Information Processing Systems, 35:5816–5828, 2022

work page 2022

[17] [17]

Moderntcn: A modern pure convolution structure for general time series analysis

Donghao Luo and Xue Wang. Moderntcn: A modern pure convolution structure for general time series analysis. InICLR, 2024

work page 2024

[18] [18]

Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

work page 2023

[19] [19]

Timemixer: Decomposable multiscale mixing for time series forecasting

Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y Zhang, and JUN ZHOU. Timemixer: Decomposable multiscale mixing for time series forecasting. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024

[20] [20]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021

work page 2021

[21] [21]

Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning, pages 27268–27286. PMLR, 2022

work page 2022

[22] [22]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

work page 2021

[23] [23]

Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting

Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. InInternational conference on learning representations, 2021

work page 2021

[24] [24]

Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting

Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. InThe eleventh international conference on learning representations, 2023

work page 2023

[25] [25]

Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting

Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 459–469, 2023

work page 2023

[26] [26]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [27]

Transformers are ssms: generalized models and efficient algorithms through structured state space duality

Tri Dao and Albert Gu. Transformers are ssms: generalized models and efficient algorithms through structured state space duality. InProceedings of the 41st International Conference on Machine Learning, pages 10041–10071, 2024

work page 2024

[28] [28]

A Survey of Mamba

Haohao Qu, Liangbo Ning, Rui An, Wenqi Fan, Tyler Derr, Xin Xu, and Qing Li. A survey of mamba. arXiv preprint arXiv:2408.01129, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [29]

Jamba: A Hybrid Transformer-Mamba Language Model

Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, et al. Jamba: A hybrid transformer-mamba language model.arXiv preprint arXiv:2403.19887, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[30] [30]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model.arXiv preprint arXiv:2401.09417, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[31] [31]

Caduceus: Bi-directional equivariant long-range dna sequence modeling

Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and V olodymyr Kuleshov. Caduceus: Bi-directional equivariant long-range dna sequence modeling. InInternational Conference on Machine Learning, pages 43632–43648. PMLR, 2024

work page 2024

[32] [32]

Ssd4rec: A structured state space duality model for efficient sequential recommendation.arXiv preprint arXiv:2409.01192, 2024

Haohao Qu, Yifeng Zhang, Liangbo Ning, Wenqi Fan, and Qing Li. Ssd4rec: A structured state space duality model for efficient sequential recommendation.arXiv preprint arXiv:2409.01192, 2024

work page arXiv 2024

[33] [33]

Deep Time Series Models: A Comprehensive Survey and Benchmark

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, and Jianmin Wang. Deep time series models: A comprehensive survey and benchmark.arXiv preprint arXiv:2407.13278, 2024. 23

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

Rethinking channel dependence for multivariate time series forecasting: Learning from leading indicators

Lifan Zhao and Yanyan Shen. Rethinking channel dependence for multivariate time series forecasting: Learning from leading indicators. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[35] [35]

Unveiling delay effects in traffic forecasting: A perspective from spatial-temporal delay differential equations

Qingqing Long, Zheng Fang, Chen Fang, Chong Chen, Pengfei Wang, and Yuanchun Zhou. Unveiling delay effects in traffic forecasting: A perspective from spatial-temporal delay differential equations. In Proceedings of the ACM on Web Conference 2024, pages 1035–1044, 2024

work page 2024

[36] [36]

Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction

Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and Jingyuan Wang. Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 4365–4373, 2023

work page 2023

[37] [37]

Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher R ´e. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

work page 2020

[38] [38]

Parallel prefix sum (scan) with cuda.GPU gems, 3(39):851–876, 2007

Mark Harris, Shubhabrata Sengupta, and John D Owens. Parallel prefix sum (scan) with cuda.GPU gems, 3(39):851–876, 2007

work page 2007

[39] [39]

Frequency-domain mlps are more effective learners in time series forecasting

Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Ning An, Defu Lian, Longbing Cao, and Zhendong Niu. Frequency-domain mlps are more effective learners in time series forecasting. Advances in Neural Information Processing Systems, 36:76656–76679, 2023

work page 2023

[40] [40]

Koopa: Learning non-stationary time series dynamics with koopman predictors.Advances in neural information processing systems, 36:12271–12290, 2023

Yong Liu, Chenyu Li, Jianmin Wang, and Mingsheng Long. Koopa: Learning non-stationary time series dynamics with koopman predictors.Advances in neural information processing systems, 36:12271–12290, 2023

work page 2023

[41] [41]

Ts2vec: Towards universal representation of time series

Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series. InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 8980–8987, 2022

work page 2022

[42] [42]

How to train your hippo: State space models with generalized orthogonal basis projections

Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, and Christopher R´e. How to train your hippo: State space models with generalized orthogonal basis projections.arXiv preprint arXiv:2206.12037, 2022

work page arXiv 2022

[43] [43]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. InInternational Conference on Learning Representations, 2021

work page 2021

[44] [44]

Demystify mamba in vision: A linear attention perspective.arXiv preprint arXiv:2405.16605, 2024

Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, and Gao Huang. Demystify mamba in vision: A linear attention perspective.arXiv preprint arXiv:2405.16605, 2024

work page arXiv 2024

[45] [45]

Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

work page 2024

[46] [46]

Time delay estimation by generalized cross correlation methods.IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2):280–285, 1984

Mordechai Azaria and David Hertz. Time delay estimation by generalized cross correlation methods.IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2):280–285, 1984

work page 1984

[47] [47]

Flatten transformer: Vision transformer using focused linear attention

Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, and Gao Huang. Flatten transformer: Vision transformer using focused linear attention. InProceedings of the IEEE/CVF international conference on computer vision, pages 5961–5971, 2023

work page 2023

[48] [48]

Trindade, ElectricityLoadDiagrams20112014, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C58C86 (2015)

Artur Trindade. ElectricityLoadDiagrams20112014. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86

work page doi:10.24432/c58c86 2015

[49] [49]

Modeling long-and short-term temporal patterns with deep neural networks

Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018

work page 2018

[50] [50]

Freeway performance measurement system: mining loop detector data.Transportation research record, 1748(1):96–102, 2001

Chao Chen, Karl Petty, Alexander Skabardonis, Pravin Varaiya, and Zhanfeng Jia. Freeway performance measurement system: mining loop detector data.Transportation research record, 1748(1):96–102, 2001

work page 2001

[51] [51]

Robust anomaly detection for multivariate time series through stochastic recurrent neural network

Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2828–2837, 2019

work page 2019

[52] [52]

Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding

Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Soderstrom. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 387–395, 2018. 24

work page 2018

[53] [53]

Swat: A water treatment testbed for research and training on ics security

Aditya P Mathur and Nils Ole Tippenhauer. Swat: A water treatment testbed for research and training on ics security. InCySWater, 2016

work page 2016

[54] [54]

Practical approach to asynchronous multivariate time series anomaly detection and localization.KDD, 2021

Ahmed Abdulaal, Zhuanghua Liu, and Tomer Lancewicki. Practical approach to asynchronous multivariate time series anomaly detection and localization.KDD, 2021

work page 2021

[55] [56]

Affirm: Interactive mamba with adaptive fourier filters for long-term time series forecasting

Yuhan Wu, Xiyu Meng, Huajin Hu, Junru Zhang, Yabo Dong, and Dongming Lu. Affirm: Interactive mamba with adaptive fourier filters for long-term time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 21599–21607, 2025

work page 2025

[56] [57]

Is mamba effective for time series forecasting?Neurocomputing, 619:129178, 2025

Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Daling Wang, and Yifei Zhang. Is mamba effective for time series forecasting?Neurocomputing, 619:129178, 2025

work page 2025

[57] [59]

Simplified mamba with disentangled dependency encoding for long-term time series forecasting.arXiv preprint arXiv:2408.12068, 2024

Zixuan Weng, Jindong Han, Wenzhao Jiang, and Hao Liu. Simplified mamba with disentangled dependency encoding for long-term time series forecasting.arXiv preprint arXiv:2408.12068, 2024

work page arXiv 2024

[58] [60]

Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping

Zhe Li, Shiyi Qi, Yiduo Li, and Zenglin Xu. Revisiting long-term time series forecasting: An investigation on linear mapping.arXiv preprint arXiv:2305.10721, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[59] [61]

Adam: A method for stochastic optimization.(No Title), 2014

P Kingma Diederik. Adam: A method for stochastic optimization.(No Title), 2014

work page 2014

[60] [62]

The UEA multivariate time series classification archive, 2018

Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. The uea multivariate time series classification archive, 2018.arXiv preprint arXiv:1811.00075, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[61] [63]

Unitime: A language-empowered unified model for cross-domain time series forecasting

Xu Liu, Junfeng Hu, Yuan Li, Shizhe Diao, Yuxuan Liang, Bryan Hooi, and Roger Zimmermann. Unitime: A language-empowered unified model for cross-domain time series forecasting. InProceedings of the ACM on Web Conference 2024, pages 4095–4106, 2024

work page 2024

[62] [64]

Reformer: The efficient transformer

Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer. InInternational Conference on Learning Representations, 2019

work page 2019

[63] [65]

Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019

work page 2019

[64] [66]

C-mamba: Channel correlation enhanced state space models for multivariate time series forecasting.arXiv preprint arXiv:2406.05316, 2024

Chaolv Zeng, Zhanyu Liu, Guanjie Zheng, and Linghe Kong. C-mamba: Channel correlation enhanced state space models for multivariate time series forecasting.arXiv preprint arXiv:2406.05316, 2024

work page arXiv 2024

[65] [67]

Mambamixer: Efficient selective state space models with dual token and channel selection.arXiv preprint arXiv:2403.19888, 2024

Ali Behrouz, Michele Santacatterina, and Ramin Zabih. Mambamixer: Efficient selective state space models with dual token and channel selection.arXiv preprint arXiv:2403.19888, 2024

work page arXiv 2024

[66] [68]

Mambats: improved selective state space models for long-term time series forecasting

Xiuding Cai, Yaoyao Zhu, Xueyao Wang, and Yu Yao. Mambats: Improved selective state space models for long-term time series forecasting.arXiv preprint arXiv:2405.16440, 2024. 25

work page arXiv 2024