Recognition: unknown
AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting
Pith reviewed 2026-05-08 08:13 UTC · model grok-4.3
The pith
AdaMamba adds input-dependent frequency bases to Mamba state updates so the model can adapt to frequency differences across variables in long time series.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AdaMamba endogenizes adaptive frequency analysis inside the Mamba state-space process. An interactive patch encoder first models inter-variable dynamics; then an adaptive frequency-gated module produces input-dependent frequency bases and replaces the standard temporal gate with a unified time-frequency gate. This lets the model scale state transitions according to learned frequency importance while retaining Mamba's long-range dependency modeling. On seven public LTSF benchmarks and two domain-specific sets the method records higher accuracy than prior state-of-the-art approaches at comparable computational cost.
What carries the argument
The adaptive frequency-gated state-space module that creates input-dependent frequency bases on the fly and merges them into a single time-frequency forgetting gate inside each Mamba block.
If this is right
- Forecast accuracy rises on multivariate series whose variables differ in dominant frequencies even when they appear aligned in the time domain.
- Computational cost stays close to standard Mamba because the frequency adaptation is folded into the existing state update rather than added as a separate module.
- The same architecture can be applied to domain-specific series without requiring hand-crafted frequency features for each new domain.
- Long-range dependency modeling remains intact because the frequency gate operates alongside rather than replacing the original Mamba recurrence.
Where Pith is reading between the lines
- The same input-dependent frequency mechanism could be dropped into other linear-time state-space architectures to test whether the gain is specific to Mamba or general to the state-space family.
- If the adaptive bases prove stable, the method offers a route to remove separate frequency preprocessing steps that currently sit outside most deep forecasters.
- On very high-dimensional series the interactive patch encoder may become a bottleneck, suggesting a natural next test of whether the frequency gating alone is sufficient when variable count grows.
Load-bearing premise
Real-world time series contain enough measurable frequency heterogeneity across variables that an input-dependent basis generated inside the Mamba update can exploit it without causing extra overfitting or training instability.
What would settle it
If an ablation that replaces the learned input-dependent frequency bases with fixed, non-adaptive bases produces equal or better accuracy on the same seven benchmarks, the value of the adaptive component is falsified.
Figures
read the original abstract
Accurate long-term time series forecasting (LTSF) requires the capture of complex long-range dependencies and dynamic periodic patterns. Recent advances in frequency-domain analysis offer a global perspective for uncovering temporal characteristics. However, real-world time series often exhibit pronounced cross-domain heterogeneity where variables that appear synchronized in the time domain can differ substantially in the frequency domain. Existing frequency-based LTSF methods often rely on implicit assumptions of cross-domain homogeneity, which limits their ability to adapt to such intricate variability. To effectively integrate frequency-domain analysis with temporal dependency learning, we propose AdaMamba, a novel framework that endogenizes adaptive and context-aware frequency analysis within the Mamba state-space update process. Specifically, AdaMamba introduces an interactive patch encoding module to capture inter-variable interaction dynamics. Then, we develop an adaptive frequency-gated state-space module that generates input-dependent frequency bases, and generalizes the conventional temporal forgetting gate into a unified time-frequency forgetting gate. This allows dynamic calibration of state transitions based on learned frequency-domain importance, while preserving Mamba's capability in modeling long-range dependencies. Extensive experiments on seven public LTSF benchmarks and two domain-specific datasets demonstrate that AdaMamba consistently outperforms state-of-the-art methods in forecasting accu racy while maintaining competitive computational efficiency. The code of AdaMamba is available at https://github.com/XDjiang25/AdaMamba.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes AdaMamba, a Mamba-based model for long-term time series forecasting that adds an interactive patch encoding module for inter-variable dynamics and an adaptive frequency-gated state-space module. The latter generates input-dependent frequency bases and generalizes the temporal forgetting gate into a unified time-frequency gate to dynamically adjust state transitions according to learned frequency importance while aiming to retain long-range dependency modeling. Experiments on seven public LTSF benchmarks plus two domain-specific datasets are reported to show consistent accuracy gains over state-of-the-art methods at competitive computational cost.
Significance. If the adaptive frequency integration can be shown to preserve the selective SSM stability and discretization properties while capturing cross-domain frequency heterogeneity, the approach would offer a concrete mechanism for hybrid time-frequency modeling in forecasting tasks where periodic patterns vary across variables or domains. The public code release supports reproducibility.
major comments (2)
- [Abstract / adaptive frequency-gated state-space module description] The description of the adaptive frequency-gated state-space module (abstract and method section) states that the conventional temporal forgetting gate is generalized to a unified time-frequency forgetting gate via input-dependent frequency bases. No explicit equation is supplied showing how the frequency component enters the state matrix A, input matrix B, or the discretization step of the underlying SSM. Without this, it is impossible to verify that the selective long-range dynamics and stability guarantees of the original Mamba are retained; any unanalyzed time-frequency coupling would directly undermine the central performance claim.
- [Experiments section] The experimental claims of consistent outperformance rest on benchmark results, yet the manuscript provides no details on the number of independent runs, statistical significance tests, error bars, or ablation studies that isolate the contribution of the input-dependent frequency bases versus the base Mamba architecture. This absence makes it difficult to rule out that observed gains arise from hyper-parameter tuning or other unanalyzed factors rather than the proposed adaptive mechanism.
minor comments (2)
- [Abstract] The abstract contains a typographical spacing error: 'accu racy' should read 'accuracy'.
- [Method] Notation for the unified time-frequency gate and frequency bases should be introduced with a clear symbol table or inline definitions to avoid ambiguity when the equations are eventually supplied.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive review. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.
read point-by-point responses
-
Referee: [Abstract / adaptive frequency-gated state-space module description] The description of the adaptive frequency-gated state-space module (abstract and method section) states that the conventional temporal forgetting gate is generalized to a unified time-frequency forgetting gate via input-dependent frequency bases. No explicit equation is supplied showing how the frequency component enters the state matrix A, input matrix B, or the discretization step of the underlying SSM. Without this, it is impossible to verify that the selective long-range dynamics and stability guarantees of the original Mamba are retained; any unanalyzed time-frequency coupling would directly undermine the central performance claim.
Authors: We agree that the current description is insufficient for verifying the integration details. The manuscript describes the adaptive frequency-gated state-space module at a conceptual level but does not supply the explicit equations for how input-dependent frequency bases modify the state matrix A, input matrix B, or the discretization step. In the revised version, we will add these precise formulations (including the modified SSM update rules and the unified time-frequency gate) to the method section, along with a brief analysis confirming that selective long-range dynamics and stability properties are preserved. revision: yes
-
Referee: [Experiments section] The experimental claims of consistent outperformance rest on benchmark results, yet the manuscript provides no details on the number of independent runs, statistical significance tests, error bars, or ablation studies that isolate the contribution of the input-dependent frequency bases versus the base Mamba architecture. This absence makes it difficult to rule out that observed gains arise from hyper-parameter tuning or other unanalyzed factors rather than the proposed adaptive mechanism.
Authors: We acknowledge this gap in experimental reporting. The manuscript presents results across the benchmarks but omits the number of runs, error bars, significance tests, and ablations isolating the frequency bases. We will revise the Experiments section to include: five independent runs per model with reported means and standard deviations; paired t-tests for statistical significance against baselines; and a dedicated ablation comparing the full model to a non-adaptive Mamba variant. These changes will substantiate the contribution of the proposed adaptive mechanism. revision: yes
Circularity Check
No circularity: novel adaptive module with independent empirical validation
full rationale
The paper proposes AdaMamba as an extension of Mamba SSMs that introduces an interactive patch encoding module and an adaptive frequency-gated state-space module generating input-dependent frequency bases to create a unified time-frequency forgetting gate. These are presented as architectural innovations with explicit design goals (capturing cross-domain frequency heterogeneity while preserving long-range dependency modeling). The central claims of outperformance rest on experimental results across seven LTSF benchmarks and two domain-specific datasets, not on any definitional equivalence, fitted-parameter renaming, or self-citation chain that reduces the result to its inputs. No load-bearing step equates a derived quantity to a fitted input or prior self-cited result by construction; the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
invented entities (1)
-
adaptive frequency-gated state-space module
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Md Atik Ahamed and Qiang Cheng. 2024. Timemachine: A time series is worth 4 mambas for long-term forecasting. InECAI 2024. IOS Press, 1688–1695
2024
- [2]
-
[3]
Jian Cao, Zhi Li, and Jian Li. 2019. Financial time series forecasting model based on CEEMDAN and LSTM.Physica A: Statistical mechanics and its applications 519 (2019), 127–139
2019
-
[4]
Zonglei Chen, Minbo Ma, Tianrui Li, Hongjun Wang, and Chongshou Li. 2023. Long sequence time-series forecasting with deep learning: A survey.Information Fusion97 (2023), 101819
2023
-
[5]
Mingyue Cheng, Jiqian Yang, Tingyue Pan, Qi Liu, Zhi Li, and Shijin Wang. 2025. Convtimenet: A deep hierarchical fully convolutional model for multivariate time series analysis. InCompanion Proceedings of the ACM on Web Conference
2025
-
[6]
Filip Elvander and Andreas Jakobsson. 2020. Defining fundamental frequency for almost harmonic signals.IEEE Transactions on Signal Processing68 (2020), 6453–6466
2020
-
[7]
Albert Gu and Tri Dao. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752 [cs.LG] https://arxiv.org/abs/2312.00752
work page internal anchor Pith review arXiv 2024
-
[8]
Siu Lau Ho and Min Xie. 1998. The use of ARIMA models for reliability forecasting and analysis.Computers & industrial engineering35, 1-2 (1998), 213–216
1998
-
[9]
Yuntong Hu and Fuyuan Xiao. 2022. Network self attention for forecasting time series.Applied Soft Computing124 (2022), 109092
2022
-
[10]
Norden E Huang, Zheng Shen, Steven R Long, Manli C Wu, Hsing H Shih, Quanan Zheng, Nai-Chyuan Yen, Chi Chao Tung, and Henry H Liu. 1998. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non- stationary time series analysis.Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences454, 1...
1998
-
[11]
Siteng Huang, Donglin Wang, Xuehan Wu, and Ao Tang. 2019. Dsanet: Dual self-attention network for multivariate time series forecasting. InProceedings of the 28th ACM international conference on information and knowledge management. 2129–2132
2019
- [12]
- [13]
-
[14]
Xudong Jiang, Shuyu Wang, Wengen Li, Hanchen Yang, Jihong Guan, Yichao Zhang, and Shuigeng Zhou. 2025. STDMamba: Spatiotemporal Decomposition Mamba for Long-Term Fine-Grained SST Prediction.IEEE Transactions on Geo- science and Remote Sensing63 (2025), 1–16. doi:10.1109/TGRS.2025.3624051
-
[15]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems30 (2017)
2017
-
[16]
Ruiqi Li, Maowei Jiang, Quangao Liu, Kai Wang, Kaiduo Feng, Yue Sun, and Xiufang Zhou. 2025. FAITH: Frequency-domain Attention In Two Horizons for time series forecasting.Knowledge-Based Systems309 (2025), 112790
2025
-
[17]
Yan Li, Xinjiang Lu, Haoyi Xiong, Jian Tang, Jiantao Su, Bo Jin, and Dejing Dou. 2023. Towards long-term time-series forecasting: Feature, pattern, and distribution. In2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 1611–1624
2023
- [18]
-
[19]
Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. 2022. Pyraformer: Low-complexity pyramidal at- tention for long-range time series modeling and forecasting. In# PLACE- HOLDER_PARENT_METADATA_V ALUE#
2022
-
[20]
Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2024. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. InThe Twelfth International Conference on Learning Representations
2024
-
[21]
Donghao Luo and Xue Wang. 2024. Moderntcn: A modern pure convolution structure for general time series analysis. InThe twelfth international conference on learning representations. 1–43
2024
-
[22]
Soheila Mehrmolaei and Mohammad Reza Keyvanpour. 2016. Time series fore- casting using improved ARIMA. In2016 Artificial Intelligence and Robotics (IRA- NOPEN). IEEE, 92–97
2016
-
[23]
Md Mahmuddun Nabi Murad, Mehmet Aktukmak, and Yasin Yilmaz. 2025. Wp- mixer: Efficient multi-resolution mixing for long-term time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 19581–19588
2025
-
[24]
Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In The Eleventh International Conference on Learning Representations
2023
-
[26]
Xihao Piao, Zheng Chen, Taichi Murayama, Yasuko Matsubara, and Yasushi Saku- rai. 2024. Fredformer: Frequency debiased transformer for time series forecasting. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 2400–2410
2024
- [27]
-
[28]
Alaa Sagheer and Mostafa Kotb. 2019. Time series forecasting of petroleum production using deep LSTM recurrent networks.Neurocomputing323 (2019), 203–213
2019
-
[29]
Bibhuti Bhusan Sahoo, Ramakar Jha, Anshuman Singh, and Deepak Kumar
-
[30]
hydrological time series forecasting.Acta Geophysica67, 5 (2019), 1471–1481
Long short-term memory (LSTM) recurrent neural network for low-flow 9 Conference acronym ’XX, June 03–05, 2026, Woodstock, NY Xudong Jiang et al. hydrological time series forecasting.Acta Geophysica67, 5 (2019), 1471–1481
2026
-
[31]
Sima Siami-Namini, Neda Tavakoli, and Akbar Siami Namin. 2018. A comparison of ARIMA and LSTM in forecasting time series. In2018 17th IEEE international conference on machine learning and applications (ICMLA). Ieee, 1394–1401
2018
-
[32]
Sima Siami-Namini, Neda Tavakoli, and Akbar Siami Namin. 2019. The perfor- mance of LSTM and BiLSTM in forecasting time series. In2019 IEEE International conference on big data (Big Data). IEEE, 3285–3292
2019
-
[33]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
2017
-
[34]
Renzhuo Wan, Shuping Mei, Jun Wang, Min Liu, and Fan Yang. 2019. Multi- variate temporal convolutional network: A deep neural networks approach for multivariate time series forecasting.Electronics8, 8 (2019), 876
2019
-
[35]
Haoxin Wang, Yipeng Mo, Kunlan Xiang, Nan Yin, Honghe Dai, Bixiong Li, and Songhai Fan. 2025. CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 21090–21098
2025
- [36]
-
[37]
Huiqiang Wang, Jian Peng, Feihu Huang, Jince Wang, Junhui Chen, and Yifei Xiao
-
[38]
InThe eleventh international conference on learning representations
Micn: Multi-scale local and global context modeling for long-term series forecasting. InThe eleventh international conference on learning representations
-
[39]
Shuo Wang, Yun Cheng, Qingye Meng, Olga Saukh, Jiang Zhang, Jingfang Fan, Yuanting Zhang, Xingyuan Yuan, and Lothar Thiele. 2025. KnowAir-V2: A Benchmark Dataset for Air Quality Forecasting with PCDCNet. (2025)
2025
-
[40]
Yulong Wang, Yushuo Liu, Xiaoyi Duan, and Kai Wang. 2025. Filterts: Compre- hensive frequency filtering for multivariate time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 21375–21383
2025
-
[41]
Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Guo Qin, Haoran Zhang, Yong Liu, Yunzhong Qiu, Jianmin Wang, and Mingsheng Long. 2024. Timexer: Empowering transformers for time series forecasting with exogenous variables.Advances in Neural Information Processing Systems37 (2024), 469–498
2024
-
[42]
Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Dal- ing Wang, and Yifei Zhang. 2025. Is mamba effective for time series forecasting? Neurocomputing619 (2025), 129178
2025
- [43]
-
[44]
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: De- composition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems34 (2021), 22419–22430
2021
-
[45]
Yuhan Wu, Xiyu Meng, Huajin Hu, Junru Zhang, Yabo Dong, and Dongming Lu
-
[46]
InProceedings of the AAAI Conference on Artificial Intelligence, Vol
Affirm: Interactive mamba with adaptive fourier filters for long-term time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 21599–21607
-
[47]
Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the dots: Multivariate time series forecasting with graph neural networks. InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 753–763
2020
-
[48]
Hanchen Yang, Jiannong Cao, Wengen Li, Shuyu Wang, Hui Li, Jihong Guan, and Shuigeng Zhou. 2025. Spatial-temporal data mining for ocean science: Data, methodologies and opportunities.ACM Transactions on Knowledge Discovery from Data19, 7 (2025), 1–47
2025
-
[49]
Hanchen Yang, Jiaqi Wang, Jiannong Cao, Wengen Li, Jialun Zheng, Yangning Li, Chunyu Miao, Jihong Guan, Shuigeng Zhou, and Philip S Yu. 2025. OKG-LLM: Aligning ocean knowledge graph with observation data via LLMs for global sea surface temperature prediction.arXiv preprint arXiv:2508.00933(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
Kun Yi, Qi Zhang, Wei Fan, Longbing Cao, Shoujin Wang, Hui He, Guodong Long, Liang Hu, Qingsong Wen, and Hui Xiong. 2025. A survey on deep learning based time series analysis with frequency transformation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 6206–6215
2025
-
[51]
Kun Yi, Qi Zhang, Wei Fan, Hui He, Liang Hu, Pengyang Wang, Ning An, Long- bing Cao, and Zhendong Niu. 2023. FourierGNN: Rethinking multivariate time series forecasting from a pure graph perspective.Advances in neural information processing systems36 (2023), 69638–69660
2023
-
[52]
Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Ning An, Defu Lian, Longbing Cao, and Zhendong Niu. 2023. Frequency-domain MLPs are more effective learners in time series forecasting.Advances in Neural Information Processing Systems36 (2023), 76656–76679
2023
-
[53]
Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2023. Are transformers effective for time series forecasting?. InProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23...
- [54]
-
[55]
Xinyu Zhang, Shanshan Feng, Jianghong Ma, Huiwei Lin, Xutao Li, Yunming Ye, Fan Li, and Yew Soon Ong. 2024. Frnet: Frequency-based rotation network for long-term time series forecasting. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3586–3597
2024
- [56]
-
[57]
Yunhao Zhang and Junchi Yan. 2023. Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting. InThe eleventh international conference on learning representations
2023
-
[58]
Zhiqiang Zhang, Yuxuan Chen, Dandan Zhang, Yining Qian, and Hongbing Wang
-
[59]
CTFNet: Long-sequence time-series forecasting based on convolution and time–frequency analysis.IEEE Transactions on Neural Networks and Learning Systems(2023)
2023
-
[60]
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long se- quence time-series forecasting. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 11106–11115
2021
-
[61]
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. InInternational conference on machine learning. PMLR, 27268–27286
2022
-
[62]
Ziyu Zhou, Gengyu Lyu, Yiming Huang, Zihao Wang, Ziyu Jia, and Zhen Yang
-
[63]
Sdformer: transformer with spectral filter and dynamic attention for multivariate time series long-term forecasting. InProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24), Jeju, Republic of Korea. 3–9. 10 AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting Conference acronym ’XX, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.