DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis
Pith reviewed 2026-05-21 16:17 UTC · model grok-4.3
The pith
DeMa splits multivariate time series into separate temporal and variate paths using modified Mamba modules to model delays and interactions at linear cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DeMa preserves Mamba's linear-complexity advantage while substantially improving its suitability for MTS settings by decomposing the input into intra-series temporal dynamics captured by a Mamba-SSD module and inter-series interactions captured by a Mamba-DALA module that integrates delay-aware linear attention.
What carries the argument
Dual-path decomposition consisting of a temporal path (Mamba-SSD for series-independent long-range dynamics) and a variate path (Mamba-DALA for delay-aware cross-variate dependencies).
If this is right
- DeMa reaches state-of-the-art accuracy on long-term and short-term forecasting while using less compute.
- The same architecture improves data imputation, anomaly detection, and series classification over prior models.
- Linear complexity is retained, so the method scales to longer sequences than quadratic attention allows.
- Series-independent parallel computation in the temporal path reduces overhead compared with fully coupled models.
Where Pith is reading between the lines
- The same split could be tested on other linear-time sequence models to see whether the delay-aware component transfers beyond Mamba.
- If the dual paths reduce memory usage in practice, the approach may enable deployment on edge devices for real-time multivariate monitoring.
- Extending the delay modeling to non-stationary or irregularly sampled series would be a direct next measurement of robustness.
Load-bearing premise
The claim that explicitly separating intra-series dynamics from inter-series interactions with added delay modeling fully resolves the three listed limitations of vanilla Mamba without leaving modeling gaps.
What would settle it
A controlled comparison on the same five benchmark tasks where a plain Mamba or a standard Transformer reaches equal or better accuracy and wall-clock time would falsify the necessity of the dual-path design.
Figures
read the original abstract
Accurate and efficient multivariate time series (MTS) analysis is increasingly critical for a wide range of intelligent applications. Within this realm, Transformers have emerged as the predominant architecture due to their strong ability to capture pairwise dependencies. However, Transformer-based models suffer from quadratic computational complexity and high memory overhead, limiting their scalability and practical deployment in long-term and large-scale MTS modeling. Recently, Mamba has emerged as a promising linear-time alternative with high expressiveness. Nevertheless, directly applying vanilla Mamba to MTS remains suboptimal due to three key limitations: (i) the lack of explicit cross-variate modeling, (ii) difficulty in disentangling the entangled intra-series temporal dynamics and inter-series interactions, and (iii) insufficient modeling of latent time-lag interaction effects. These issues constrain its effectiveness across diverse MTS tasks. To address these challenges, we propose DeMa, a dual-path delay-aware Mamba backbone. DeMa preserves Mamba's linear-complexity advantage while substantially improving its suitability for MTS settings. Specifically, DeMa introduces three key innovations: (i) it decomposes the MTS into intra-series temporal dynamics and inter-series interactions; (ii) it develops a temporal path with a Mamba-SSD module to capture long-range dynamics within each individual series, enabling series-independent, parallel computation; and (iii) it designs a variate path with a Mamba-DALA module that integrates delay-aware linear attention to model cross-variate dependencies. Extensive experiments on five representative tasks, long- and short-term forecasting, data imputation, anomaly detection, and series classification, demonstrate that DeMa achieves state-of-the-art performance while delivering remarkable computational efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DeMa, a dual-path delay-aware Mamba backbone for multivariate time series analysis. It decomposes MTS data into intra-series temporal dynamics (via a Mamba-SSD module in the temporal path) and inter-series interactions (via a Mamba-DALA module with delay-aware linear attention in the variate path). The work claims this addresses three limitations of vanilla Mamba—lack of explicit cross-variate modeling, entangled dynamics, and insufficient latent time-lag effects—while retaining linear complexity, outperforming Transformer-based models. Extensive experiments across five tasks (long- and short-term forecasting, data imputation, anomaly detection, and series classification) are reported to demonstrate state-of-the-art performance and computational efficiency.
Significance. If the central claims hold, the contribution would be significant for efficient MTS modeling. The dual-path design and explicit handling of time-lag effects via linear mechanisms could provide a scalable alternative to quadratic Transformer architectures, with potential impact on long-sequence applications. The emphasis on disentangling intra- and inter-series components while preserving Mamba's efficiency is a clear strength, particularly if the delay-aware component is shown to be both effective and complexity-preserving.
major comments (2)
- [Abstract] Abstract: The central claim that Mamba-DALA 'integrates delay-aware linear attention' to model cross-variate dependencies and latent time-lag interaction effects lacks any supporting equation, state-transition modification, kernel definition, or pseudocode. This is load-bearing because the efficiency advantage and disentanglement rest on the mechanism preserving linear complexity for arbitrary or variable lags; without the concrete formulation it is impossible to rule out fallback to dense attention or restriction to a fixed small lag set.
- [§3 (Architecture description)] The description of the variate path (Mamba-DALA) does not specify how delays are injected (e.g., via modified selective state transitions, lag-specific kernels, or adjusted scanning) while keeping overall complexity linear. If the implementation either reintroduces quadratic terms for long lags or uses a small fixed lag set, the claimed modeling of entangled inter-series dynamics would be incomplete, directly undermining the SOTA and efficiency results on the five tasks.
minor comments (2)
- [Abstract] The abstract states 'extensive experiments' demonstrate SOTA results but does not reference specific tables, metrics (e.g., MSE, MAE), baselines, or statistical tests; these details should be summarized with pointers to the relevant result sections or tables.
- Notation for the two paths (temporal vs. variate) and the modules (Mamba-SSD, Mamba-DALA) should be introduced with consistent symbols or diagrams early in the architecture section to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address the concerns about the clarity of the Mamba-DALA formulation by adding explicit equations, a complexity analysis, and pseudocode.
read point-by-point responses
-
Referee: [Abstract] The central claim that Mamba-DALA 'integrates delay-aware linear attention' to model cross-variate dependencies and latent time-lag interaction effects lacks any supporting equation, state-transition modification, kernel definition, or pseudocode. This is load-bearing because the efficiency advantage and disentanglement rest on the mechanism preserving linear complexity for arbitrary or variable lags; without the concrete formulation it is impossible to rule out fallback to dense attention or restriction to a fixed small lag set.
Authors: We thank the referee for this observation. The abstract is intentionally high-level; the concrete formulation appears in Section 3.2, where delay-aware linear attention is realized by modifying the selective state transitions with lag-specific parameters inside the linear kernel. To improve clarity we have inserted the defining equations for the modified state update and the lag kernel, together with a short complexity argument showing the scan remains strictly linear in sequence length for arbitrary lags. Pseudocode is now also provided in the appendix. revision: yes
-
Referee: [§3 (Architecture description)] The description of the variate path (Mamba-DALA) does not specify how delays are injected (e.g., via modified selective state transitions, lag-specific kernels, or adjusted scanning) while keeping overall complexity linear. If the implementation either reintroduces quadratic terms for long lags or uses a small fixed lag set, the claimed modeling of entangled inter-series dynamics would be incomplete, directly undermining the SOTA and efficiency results on the five tasks.
Authors: We appreciate the referee drawing attention to this potential ambiguity. In the original text, delays are injected by lag-specific kernels that adjust the selective parameters of the Mamba scan; the overall procedure stays linear because the attention is computed via a single selective state-space pass rather than pairwise operations. Nevertheless, we agree the exposition can be tightened. The revised Section 3 now contains an explicit step-by-step derivation of the delay injection, the corresponding kernel definition, and a formal complexity proof confirming O(N) scaling even for variable or long lags. These additions directly support the reported efficiency and performance claims. revision: yes
Circularity Check
No significant circularity: DeMa defines new modules and validates empirically on external tasks
full rationale
The paper proposes DeMa as a dual-path architecture that decomposes multivariate time series into intra-series temporal dynamics (via Mamba-SSD) and inter-series interactions (via Mamba-DALA with delay-aware linear attention). These components are introduced as explicit innovations to address stated limitations of vanilla Mamba, with the overall model evaluated through experiments on five standard external tasks. No equations, predictions, or central claims reduce by construction to fitted parameters, self-citations, or renamed inputs; the derivation remains a sequence of architectural definitions followed by independent empirical results.
Axiom & Free-Parameter Ledger
invented entities (2)
-
Mamba-SSD module
no independent evidence
-
Mamba-DALA module
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArrowOfTime.leanarrow_from_z unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DeMa introduces three key innovations: (i) it decomposes the MTS into intra-series temporal dynamics and inter-series interactions; (ii) temporal path with Mamba-SSD ... (iii) variate path with Mamba-DALA that integrates delay-aware linear attention
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Mamba-DALA ... global correlation delay ... token-level relative delay ... RoPE
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. Deep learning for time series forecasting: a survey.International Journal of Machine Learning and Cybernetics, pages 1–34, 2025
work page 2025
-
[2]
Guangyu Huo, Yong Zhang, Boyue Wang, Junbin Gao, Yongli Hu, and Baocai Yin. Hierarchical spatio– temporal graph convolutional networks and transformer network for traffic flow forecasting.IEEE Transactions on Intelligent Transportation Systems, 24(4):3855–3867, 2023
work page 2023
-
[3]
Rui An, Yifeng Zhang, Ziran Liang, Wenqi Fan, Yuxuan Liang, Xuequn Shang, and Qing Li. Damba-st: Domain-adaptive mamba for efficient urban spatio-temporal prediction.arXiv preprint arXiv:2506.18939, 2025
-
[4]
Feiyi Chen, Zhen Qin, Mengchu Zhou, Yingying Zhang, Shuiguang Deng, Lunting Fan, Guansong Pang, and Qingsong Wen. Lara: A light and anti-overfitting retraining approach for unsupervised time series anomaly detection. InProceedings of the ACM on Web Conference 2024, pages 4138–4149, 2024
work page 2024
-
[5]
Ane Bl´azquez-Garc´ıa, Angel Conde, Usue Mori, and Jose A Lozano. A review on outlier/anomaly detection in time series data.ACM computing surveys (CSUR), 54(3):1–33, 2021
work page 2021
-
[6]
Zhen Qin, Yibo Zhang, Shuyu Meng, Zhiguang Qin, and Kim-Kwang Raymond Choo. Imaging and fusing time series for wearable sensor-based human activity recognition.Information Fusion, 53:80–87, 2020
work page 2020
-
[7]
Attention is all you need.Advances in Neural Information Processing Systems, 2017
A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017
work page 2017
-
[8]
A time series is worth 64 words: Long-term forecasting with transformers
Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[9]
Timer: Generative pre-trained transformers are large time series models
Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Timer: Generative pre-trained transformers are large time series models. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[10]
Unified training of universal time series forecasting transformers
Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[11]
itrans- former: Inverted transformers are effective for time series forecasting
Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itrans- former: Inverted transformers are effective for time series forecasting. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[12]
Timesnet: Temporal 2d-variation modeling for general time series analysis
Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[13]
Ziran Liang, Rui An, Wenqi Fan, Yanghui Rao, and Yuxuan Liang. itfkan: Interpretable time series forecasting with kolmogorov-arnold network.arXiv preprint arXiv:2504.16432, 2025
-
[14]
Hansika Hewamalage, Christoph Bergmeir, and Kasun Bandara. Recurrent neural networks for time series forecasting: Current status and future directions.International Journal of Forecasting, 37(1):388–427, 2021. 22
work page 2021
-
[15]
Recurrent neural networks for time series classification.Neurocomputing, 50:223–235, 2003
Michael H¨usken and Peter Stagge. Recurrent neural networks for time series classification.Neurocomputing, 50:223–235, 2003
work page 2003
-
[16]
Minhao Liu, Ailing Zeng, Muxi Chen, Zhijian Xu, Qiuxia Lai, Lingna Ma, and Qiang Xu. Scinet: Time series modeling and forecasting with sample convolution and interaction.Advances in Neural Information Processing Systems, 35:5816–5828, 2022
work page 2022
-
[17]
Moderntcn: A modern pure convolution structure for general time series analysis
Donghao Luo and Xue Wang. Moderntcn: A modern pure convolution structure for general time series analysis. InICLR, 2024
work page 2024
-
[18]
Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023
work page 2023
-
[19]
Timemixer: Decomposable multiscale mixing for time series forecasting
Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y Zhang, and JUN ZHOU. Timemixer: Decomposable multiscale mixing for time series forecasting. InInternational Conference on Learning Representations (ICLR), 2024
work page 2024
-
[20]
Informer: Beyond efficient transformer for long sequence time-series forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021
work page 2021
-
[21]
Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning, pages 27268–27286. PMLR, 2022
work page 2022
-
[22]
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021
work page 2021
-
[23]
Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting
Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. InInternational conference on learning representations, 2021
work page 2021
-
[24]
Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. InThe eleventh international conference on learning representations, 2023
work page 2023
-
[25]
Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting
Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 459–469, 2023
work page 2023
-
[26]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
Tri Dao and Albert Gu. Transformers are ssms: generalized models and efficient algorithms through structured state space duality. InProceedings of the 41st International Conference on Machine Learning, pages 10041–10071, 2024
work page 2024
-
[28]
Haohao Qu, Liangbo Ning, Rui An, Wenqi Fan, Tyler Derr, Xin Xu, and Qing Li. A survey of mamba. arXiv preprint arXiv:2408.01129, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
Jamba: A Hybrid Transformer-Mamba Language Model
Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, et al. Jamba: A hybrid transformer-mamba language model.arXiv preprint arXiv:2403.19887, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model.arXiv preprint arXiv:2401.09417, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Caduceus: Bi-directional equivariant long-range dna sequence modeling
Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and V olodymyr Kuleshov. Caduceus: Bi-directional equivariant long-range dna sequence modeling. InInternational Conference on Machine Learning, pages 43632–43648. PMLR, 2024
work page 2024
-
[32]
Haohao Qu, Yifeng Zhang, Liangbo Ning, Wenqi Fan, and Qing Li. Ssd4rec: A structured state space duality model for efficient sequential recommendation.arXiv preprint arXiv:2409.01192, 2024
-
[33]
Deep Time Series Models: A Comprehensive Survey and Benchmark
Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, and Jianmin Wang. Deep time series models: A comprehensive survey and benchmark.arXiv preprint arXiv:2407.13278, 2024. 23
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[34]
Lifan Zhao and Yanyan Shen. Rethinking channel dependence for multivariate time series forecasting: Learning from leading indicators. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[35]
Qingqing Long, Zheng Fang, Chen Fang, Chong Chen, Pengfei Wang, and Yuanchun Zhou. Unveiling delay effects in traffic forecasting: A perspective from spatial-temporal delay differential equations. In Proceedings of the ACM on Web Conference 2024, pages 1035–1044, 2024
work page 2024
-
[36]
Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction
Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and Jingyuan Wang. Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 4365–4373, 2023
work page 2023
-
[37]
Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher R ´e. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020
work page 2020
-
[38]
Parallel prefix sum (scan) with cuda.GPU gems, 3(39):851–876, 2007
Mark Harris, Shubhabrata Sengupta, and John D Owens. Parallel prefix sum (scan) with cuda.GPU gems, 3(39):851–876, 2007
work page 2007
-
[39]
Frequency-domain mlps are more effective learners in time series forecasting
Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Ning An, Defu Lian, Longbing Cao, and Zhendong Niu. Frequency-domain mlps are more effective learners in time series forecasting. Advances in Neural Information Processing Systems, 36:76656–76679, 2023
work page 2023
-
[40]
Yong Liu, Chenyu Li, Jianmin Wang, and Mingsheng Long. Koopa: Learning non-stationary time series dynamics with koopman predictors.Advances in neural information processing systems, 36:12271–12290, 2023
work page 2023
-
[41]
Ts2vec: Towards universal representation of time series
Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series. InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 8980–8987, 2022
work page 2022
-
[42]
How to train your hippo: State space models with generalized orthogonal basis projections
Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, and Christopher R´e. How to train your hippo: State space models with generalized orthogonal basis projections.arXiv preprint arXiv:2206.12037, 2022
-
[43]
Reversible instance normalization for accurate time-series forecasting against distribution shift
Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. InInternational Conference on Learning Representations, 2021
work page 2021
-
[44]
Demystify mamba in vision: A linear attention perspective.arXiv preprint arXiv:2405.16605, 2024
Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, and Gao Huang. Demystify mamba in vision: A linear attention perspective.arXiv preprint arXiv:2405.16605, 2024
-
[45]
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
work page 2024
-
[46]
Mordechai Azaria and David Hertz. Time delay estimation by generalized cross correlation methods.IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2):280–285, 1984
work page 1984
-
[47]
Flatten transformer: Vision transformer using focused linear attention
Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, and Gao Huang. Flatten transformer: Vision transformer using focused linear attention. InProceedings of the IEEE/CVF international conference on computer vision, pages 5961–5971, 2023
work page 2023
-
[48]
Artur Trindade. ElectricityLoadDiagrams20112014. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C58C86
-
[49]
Modeling long-and short-term temporal patterns with deep neural networks
Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018
work page 2018
-
[50]
Chao Chen, Karl Petty, Alexander Skabardonis, Pravin Varaiya, and Zhanfeng Jia. Freeway performance measurement system: mining loop detector data.Transportation research record, 1748(1):96–102, 2001
work page 2001
-
[51]
Robust anomaly detection for multivariate time series through stochastic recurrent neural network
Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2828–2837, 2019
work page 2019
-
[52]
Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding
Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Soderstrom. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 387–395, 2018. 24
work page 2018
-
[53]
Swat: A water treatment testbed for research and training on ics security
Aditya P Mathur and Nils Ole Tippenhauer. Swat: A water treatment testbed for research and training on ics security. InCySWater, 2016
work page 2016
-
[54]
Ahmed Abdulaal, Zhuanghua Liu, and Tomer Lancewicki. Practical approach to asynchronous multivariate time series anomaly detection and localization.KDD, 2021
work page 2021
-
[56]
Affirm: Interactive mamba with adaptive fourier filters for long-term time series forecasting
Yuhan Wu, Xiyu Meng, Huajin Hu, Junru Zhang, Yabo Dong, and Dongming Lu. Affirm: Interactive mamba with adaptive fourier filters for long-term time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 21599–21607, 2025
work page 2025
-
[57]
Is mamba effective for time series forecasting?Neurocomputing, 619:129178, 2025
Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Daling Wang, and Yifei Zhang. Is mamba effective for time series forecasting?Neurocomputing, 619:129178, 2025
work page 2025
-
[59]
Zixuan Weng, Jindong Han, Wenzhao Jiang, and Hao Liu. Simplified mamba with disentangled dependency encoding for long-term time series forecasting.arXiv preprint arXiv:2408.12068, 2024
-
[60]
Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping
Zhe Li, Shiyi Qi, Yiduo Li, and Zenglin Xu. Revisiting long-term time series forecasting: An investigation on linear mapping.arXiv preprint arXiv:2305.10721, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[61]
Adam: A method for stochastic optimization.(No Title), 2014
P Kingma Diederik. Adam: A method for stochastic optimization.(No Title), 2014
work page 2014
-
[62]
The UEA multivariate time series classification archive, 2018
Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. The uea multivariate time series classification archive, 2018.arXiv preprint arXiv:1811.00075, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[63]
Unitime: A language-empowered unified model for cross-domain time series forecasting
Xu Liu, Junfeng Hu, Yuan Li, Shizhe Diao, Yuxuan Liang, Bryan Hooi, and Roger Zimmermann. Unitime: A language-empowered unified model for cross-domain time series forecasting. InProceedings of the ACM on Web Conference 2024, pages 4095–4106, 2024
work page 2024
-
[64]
Reformer: The efficient transformer
Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. Reformer: The efficient transformer. InInternational Conference on Learning Representations, 2019
work page 2019
-
[65]
Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting
Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019
work page 2019
-
[66]
Chaolv Zeng, Zhanyu Liu, Guanjie Zheng, and Linghe Kong. C-mamba: Channel correlation enhanced state space models for multivariate time series forecasting.arXiv preprint arXiv:2406.05316, 2024
-
[67]
Ali Behrouz, Michele Santacatterina, and Ramin Zabih. Mambamixer: Efficient selective state space models with dual token and channel selection.arXiv preprint arXiv:2403.19888, 2024
-
[68]
Mambats: improved selective state space models for long-term time series forecasting
Xiuding Cai, Yaoyao Zhu, Xueyao Wang, and Yu Yao. Mambats: Improved selective state space models for long-term time series forecasting.arXiv preprint arXiv:2405.16440, 2024. 25
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.