Recognition: unknown
MedMamba: Recasting Mamba for Medical Time Series Classification
Pith reviewed 2026-05-10 08:53 UTC · model grok-4.3
The pith
MedMamba adapts bidirectional Mamba blocks with channel mixing and multi-scale tokenization to classify medical time series more accurately than prior approaches.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MedMamba is a multi-scale bidirectional state space model whose channel-mixing module, multi-scale convolutional tokenization, and bidirectional Mamba blocks directly instantiate the spatial centralization, multi-timescale composition, and non-causal dependency of physiological signals, producing new state-of-the-art accuracies including 85.97 percent on PTB and 54.72 percent on ADFTD together with a 4.6 times inference speedup.
What carries the argument
The bidirectional Mamba blocks that model global context linearly, paired with multi-scale convolutional tokenization for temporal decomposition and a channel-mixing module for cross-channel reparameterization.
If this is right
- The architecture models long-range dependencies effectively enough to set new marks on extended sequences such as SleepEDF.
- Inference runs 4.6 times faster than competing methods, supporting deployment in time-sensitive clinical settings.
- Linear scaling in sequence length removes the quadratic barrier that limits Transformer use on high-frequency medical recordings.
- The same design principles produce consistent gains across EEG, ECG, and human activity modalities.
Where Pith is reading between the lines
- The same tokenization and mixing pattern could be tested on other structured sequential data such as financial tick series or industrial sensor streams.
- Ablation studies that vary the number of tokenization scales per dataset could reveal which timescales dominate particular signal types.
- Extending the bidirectional blocks to streaming inputs with fixed memory would check whether the non-causal benefit survives online constraints.
Load-bearing premise
The observed accuracy and speed gains arise because the added modules correctly capture the three stated inductive biases of physiological signals rather than from differences in training procedure or dataset properties.
What would settle it
Retraining a plain bidirectional Mamba or a Transformer on the same six datasets with matched hyperparameter budgets and showing no accuracy gap would indicate that the proposed modules are not required for the reported gains.
Figures
read the original abstract
Medical time series, such as electrocardiograms (ECG) and electroencephalograms (EEG), exhibit complex temporal dynamics and structured cross-channel dependencies, posing fundamental challenges for automated analysis. Conventional convolutional and recurrent models struggle to capture long-range dependencies, while Transformer-based approaches incur quadratic complexity and often introduce redundant interactions that are misaligned with the intrinsic structure of physiological signals. To address these limitations, we propose MedMamba, a principle-driven multi-scale bidirectional state space architecture tailored for medical time series classification. Our design is guided by three key inductive biases of physiological signals: spatial centralization, multi-timescale temporal composition, and non-causal contextual dependency. These principles are instantiated through a lightweight channel-mixing module for cross-channel reparameterization, multi-scale convolutional tokenization for temporal decomposition, and bidirectional Mamba blocks for efficient global context modeling with linear complexity. Extensive experiments on six benchmark datasets spanning EEG, ECG, and human activity signals demonstrate that MedMamba consistently outperforms state-of-the-art methods across diverse modalities. Notably, it achieves 85.97% accuracy on PTB and establishes new state-of-the-art performance on the challenging ADFTD dataset (54.72% accuracy and 52.01% F1-score). Strong results on long-sequence benchmarks, such as SleepEDF, further validate its capability in modeling long-range dependencies. Moreover, MedMamba achieves a speedup of 4.6x in inference, highlighting its practicality for real-time clinical deployment. These results suggest that principle-guided state space modeling offers an effective and scalable alternative to Transformer-based approaches for medical time series analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MedMamba, a principle-driven multi-scale bidirectional state space model for medical time series classification. Guided by three inductive biases of physiological signals (spatial centralization, multi-timescale temporal composition, non-causal contextual dependency), it uses a lightweight channel-mixing module, multi-scale convolutional tokenization, and bidirectional Mamba blocks to achieve linear complexity. Experiments across six datasets (EEG, ECG, activity signals) report consistent outperformance of SOTA methods, including 85.97% accuracy on PTB, new SOTA on ADFTD (54.72% accuracy, 52.01% F1), strong long-sequence results on SleepEDF, and 4.6x inference speedup.
Significance. If the performance claims hold under rigorous validation, MedMamba offers a scalable, efficient alternative to quadratic-complexity Transformers for medical time series, with particular value for long-range dependency modeling in clinical settings. The explicit mapping of domain inductive biases to architectural components is a conceptual strength, and the reported efficiency gains support potential real-time deployment.
major comments (3)
- [Experimental Results] Experimental section: reported accuracies (e.g., 85.97% on PTB, 54.72% on ADFTD) and the 4.6x speedup lack error bars, standard deviations across runs, or statistical significance tests against baselines, undermining confidence that observed gains are robust rather than attributable to random variation or tuning.
- [Ablation Studies] Ablation studies: no experiments isolate the individual contributions of the channel-mixing module, multi-scale convolutional tokenization, and bidirectional Mamba blocks, which is required to substantiate that these components correctly instantiate the three stated inductive biases and explain the performance differences versus baselines.
- [Experimental Results] Baseline details: full information on baseline implementations, hyperparameter search protocols, and whether all models were trained/evaluated under identical conditions and data splits is missing, which is load-bearing for the central claim of establishing new state-of-the-art results.
minor comments (2)
- [Abstract] The abstract lists performance highlights but does not name all six datasets; explicit enumeration would aid readability.
- [Method] Notation for the bidirectional Mamba blocks and multi-scale tokenization could be clarified with a single diagram or pseudocode equation to make the architecture more immediately accessible.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Automatic diagnosis of the 12-lead ecg using a deep neural network,
A. H. Ribeiro, M. H. Ribeiro, G. M. Paix ˜ao, D. M. Oliveira, P. R. Gomes, J. A. Canazart, M. P. Ferreira, C. R. Andersson, P. W. Macfarlane, W. Meira Jret al., “Automatic diagnosis of the 12-lead ecg using a deep neural network,”Nature communications, vol. 11, no. 1, p. 1760, 2020
2020
-
[2]
A comprehensive benchmark for electrocardiogram time-series,
Z. Tang, J. Qi, Y . Zheng, and J. Huang, “A comprehensive benchmark for electrocardiogram time-series,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 6490–6499
2025
-
[3]
Deep learning for elec- troencephalogram (eeg) classification tasks: a review,
A. Craik, Y . He, and J. L. Contreras-Vidal, “Deep learning for elec- troencephalogram (eeg) classification tasks: a review,”Journal of neural engineering, vol. 16, no. 3, p. 031001, 2019
2019
-
[4]
Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,
V . J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,”Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018
2018
-
[5]
Eeg-emg faconformer: Frequency aware conv-transformer for the fusion of eeg and emg,
Z. He, M. Cai, L. Li, S. Tian, and R.-J. Dai, “Eeg-emg faconformer: Frequency aware conv-transformer for the fusion of eeg and emg,” in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2024, pp. 3258–3261
2024
-
[6]
Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,
A. Y . Hannun, P. Rajpurkar, M. Haghpanahi, G. H. Tison, C. Bourn, M. P. Turakhia, and A. Y . Ng, “Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,”Nature medicine, vol. 25, no. 1, pp. 65–69, 2019
2019
-
[7]
A deep convolutional neural network model to classify heartbeats,
U. R. Acharya, S. L. Oh, Y . Hagiwara, J. H. Tan, M. Adam, A. Gertych, and R. San Tan, “A deep convolutional neural network model to classify heartbeats,”Computers in biology and medicine, vol. 89, pp. 389–396, 2017
2017
-
[8]
Real-time patient-specific ecg classification by 1-d convolutional neural networks,
S. Kiranyaz, T. Ince, and M. Gabbouj, “Real-time patient-specific ecg classification by 1-d convolutional neural networks,”IEEE transactions on biomedical engineering, vol. 63, no. 3, pp. 664–675, 2015
2015
-
[9]
Learning to diagnose with lstm recurrent neural networks,
Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzell, “Learning to diagnose with lstm recurrent neural networks,”arXiv preprint arXiv:1511.03677, 2016
-
[10]
Medformer: A multi-granularity patching transformer for medical time-series classifi- cation,
Y . Wang, N. Huang, T. Li, Y . Yan, and X. Zhang, “Medformer: A multi-granularity patching transformer for medical time-series classifi- cation,”Advances in Neural Information Processing Systems, vol. 37, pp. 36 314–36 341, 2024
2024
-
[11]
Informer: Beyond efficient transformer for long sequence time-series forecasting,
H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI conference on artificial intel- ligence, vol. 35, no. 12, 2021, pp. 11 106–11 115
2021
-
[12]
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itrans- former: Inverted transformers are effective for time series forecasting,” arXiv preprint arXiv:2310.06625, 2023
work page internal anchor Pith review arXiv 2023
-
[13]
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,
H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” Advances in neural information processing systems, vol. 34, pp. 22 419– 22 430, 2021
2021
-
[14]
Efficiently Modeling Long Sequences with Structured State Spaces
A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,”arXiv preprint arXiv:2111.00396, 2021
work page internal anchor Pith review arXiv 2021
-
[15]
Mamba: Linear-time sequence modeling with selective state spaces,
A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inFirst conference on language modeling, 2024
2024
-
[16]
A review of deep learning methods for irregularly sampled medical time series data,
C. Sun, M. Song, D. Cai, B. Zhang, H. Li, and S. Hong, “A review of deep learning methods for irregularly sampled medical time series data,” Health Data Science, 2020
2020
-
[17]
Is mamba effective for time series forecasting?
Z. Wang, F. Kong, S. Feng, M. Wang, X. Yang, H. Zhao, D. Wang, and Y . Zhang, “Is mamba effective for time series forecasting?”Neurocom- puting, vol. 619, p. 129178, 2025
2025
-
[18]
Gcmnet: A global context mamba network for long-term time series forecasting,
X. Liu, J. Ren, H. Zhang, and E. Zhang, “Gcmnet: A global context mamba network for long-term time series forecasting,”Pattern Recog- nition, p. 113287, 2026
2026
-
[19]
Ecg-mamba: Cardiac abnormality classification with non-uniform-mix augmentation on 12-lead ecgs,
H. Jiang, H. Mutahira, S. Wei, and M. S. Muhammad, “Ecg-mamba: Cardiac abnormality classification with non-uniform-mix augmentation on 12-lead ecgs,”IEEE Journal of Translational Engineering in Health and Medicine, 2025
2025
-
[20]
Decoupled Weight Decay Regularization
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[21]
SGDR: Stochastic Gradient Descent with Warm Restarts
——, “Sgdr: Stochastic gradient descent with warm restarts,”arXiv preprint arXiv:1608.03983, 2016
work page Pith review arXiv 2016
-
[22]
Analysis of electroencephalograms in alzheimer’s disease patients with multiscale entropy,
J. Escudero, D. Ab ´asolo, R. Hornero, P. Espino, and M. L´opez, “Analysis of electroencephalograms in alzheimer’s disease patients with multiscale entropy,”Physiological measurement, vol. 27, no. 11, pp. 1091–1106, 2006
2006
-
[23]
Dice-net: a novel convolution-transformer architecture for alzheimer detection in eeg signals,
A. Miltiadous, E. Gionanidis, K. D. Tzimourta, N. Giannakeas, and A. T. Tzallas, “Dice-net: a novel convolution-transformer architecture for alzheimer detection in eeg signals,”IEEe Access, vol. 11, pp. 71 840– 71 858, 2023
2023
-
[24]
Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,
A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,”circulation, vol. 101, no. 23, pp. e215–e220, 2000
2000
-
[25]
Ptb-xl, a large publicly available electro- cardiography dataset,
P. Wagner, N. Strodthoff, R.-D. Bousseljot, D. Kreiseler, F. I. Lunze, W. Samek, and T. Schaeffter, “Ptb-xl, a large publicly available electro- cardiography dataset,”Scientific data, vol. 7, no. 1, p. 154, 2020
2020
-
[26]
A pub- lic domain dataset for human activity recognition using smartphones
D. Anguita, A. Ghio, L. Oneto, X. Parra, J. L. Reyes-Ortizet al., “A pub- lic domain dataset for human activity recognition using smartphones.” inEsann, vol. 3, no. 1, 2013, pp. 3–4
2013
-
[27]
Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,
Y . Zhang and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,” inThe eleventh international conference on learning representations, 2023
2023
-
[28]
Fedformer: Frequency enhanced decomposed transformer for long-term series fore- casting,
T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: Frequency enhanced decomposed transformer for long-term series fore- casting,” inInternational conference on machine learning. PMLR, 2022, pp. 27 268–27 286
2022
-
[29]
Multi-resolution time-series transformer for long-term forecasting,
Y . Zhang, L. Ma, S. Pal, Y . Zhang, and M. Coates, “Multi-resolution time-series transformer for long-term forecasting,” inInternational con- ference on artificial intelligence and statistics. PMLR, 2024, pp. 4222– 4230
2024
-
[30]
Non-stationary transformers: Exploring the stationarity in time series forecasting,
Y . Liu, H. Wu, J. Wang, and M. Long, “Non-stationary transformers: Exploring the stationarity in time series forecasting,”Advances in neural information processing systems, vol. 35, pp. 9881–9893, 2022
2022
-
[31]
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,”arXiv preprint arXiv:2211.14730, 2022
work page internal anchor Pith review arXiv 2022
-
[32]
Reformer: The Efficient Transformer
N. Kitaev, Ł. Kaiser, and A. Levskaya, “Reformer: The efficient trans- former,”arXiv preprint arXiv:2001.04451, 2020
work page internal anchor Pith review arXiv 2001
-
[33]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[34]
Visualizing data using t-sne
L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”Journal of machine learning research, vol. 9, no. 11, 2008
2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.