MedMamba: Recasting Mamba for Medical Time Series Classification

ZhengXiao He , Huayu Li , Xiwen Chen , Janet M Roveda , Jinghao Wen , Siyuan Tian , Ao Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:53 UTC · model grok-4.3

classification 📡 eess.SP cs.AIcs.LG

keywords medical time series classificationstate space modelsMambaEEG classificationECG classificationbidirectional modelingmulti-scale tokenizationlong-range dependencies

0 comments

The pith

MedMamba adapts bidirectional Mamba blocks with channel mixing and multi-scale tokenization to classify medical time series more accurately than prior approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that medical signals such as ECG and EEG follow three structural patterns that standard models overlook: signals are spatially centralized across channels, built from events at multiple timescales, and require context from both past and future. By embedding these patterns into a state space architecture through a lightweight channel mixer, convolutional tokens at several scales, and forward-backward Mamba layers, the model captures long dependencies at linear cost. If correct, this yields higher classification accuracy on clinical datasets while cutting inference time enough for real-time use, offering a practical replacement for quadratic Transformer models.

Core claim

MedMamba is a multi-scale bidirectional state space model whose channel-mixing module, multi-scale convolutional tokenization, and bidirectional Mamba blocks directly instantiate the spatial centralization, multi-timescale composition, and non-causal dependency of physiological signals, producing new state-of-the-art accuracies including 85.97 percent on PTB and 54.72 percent on ADFTD together with a 4.6 times inference speedup.

What carries the argument

The bidirectional Mamba blocks that model global context linearly, paired with multi-scale convolutional tokenization for temporal decomposition and a channel-mixing module for cross-channel reparameterization.

If this is right

The architecture models long-range dependencies effectively enough to set new marks on extended sequences such as SleepEDF.
Inference runs 4.6 times faster than competing methods, supporting deployment in time-sensitive clinical settings.
Linear scaling in sequence length removes the quadratic barrier that limits Transformer use on high-frequency medical recordings.
The same design principles produce consistent gains across EEG, ECG, and human activity modalities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tokenization and mixing pattern could be tested on other structured sequential data such as financial tick series or industrial sensor streams.
Ablation studies that vary the number of tokenization scales per dataset could reveal which timescales dominate particular signal types.
Extending the bidirectional blocks to streaming inputs with fixed memory would check whether the non-causal benefit survives online constraints.

Load-bearing premise

The observed accuracy and speed gains arise because the added modules correctly capture the three stated inductive biases of physiological signals rather than from differences in training procedure or dataset properties.

What would settle it

Retraining a plain bidirectional Mamba or a Transformer on the same six datasets with matched hyperparameter budgets and showing no accuracy gap would indicate that the proposed modules are not required for the reported gains.

Figures

Figures reproduced from arXiv: 2605.05214 by Ao Li, Huayu Li, Janet M Roveda, Jinghao Wen, Siyuan Tian, Xiwen Chen, ZhengXiao He.

**Figure 1.** Figure 1: Principle-driven framework for medical time series modeling. Top: Medical signals inherently exhibit temporal heterogeneity and spatial dependency across channels, necessitating models that jointly capture both dimensions. Bottom: Guided by these principles, we design a structured pipeline: channel mixing for spatial reparameterization, multi-scale embedding for temporal decomposition, and bidirectional … view at source ↗

**Figure 2.** Figure 2: Overview of MedMamba, a multi-scale bidirectional Mamba architecture for medical time series analysis. The framework first performs channel-aware preprocessing to capture cross-channel interactions in physiological signals. It then employs multi-scale convolutional embeddings to encode temporal dynamics at different resolutions, enabling simultaneous modeling of fast local variations and slow global trends… view at source ↗

**Figure 3.** Figure 3: Effect of channel mixing on the ADFTD dataset. Left: mean pairwise Pearson correlation within each class: Alzheimer’s Disease (AD), Frontotemporal Dementia (FTD), and Cognitively Normal (CN), where lower values indicate reduced channel redundancy. Right: interclass discriminability measured by the Frobenius norm of correlation matrix differences, where higher values indicate greater separability. Percenta… view at source ↗

**Figure 4.** Figure 4: Accuracy–efficiency trade-off on the ADFTD dataset. Each bubble represents a model, with the horizontal axis indicating inference throughput (samples/sec), the vertical axis indicating macro F1-score (%), and the bubble size proportional to the number of parameters. MedMamba (upper-right) achieves competitive F1-score with the highest throughput and a compact parameter budget. and clear inter-class margins… view at source ↗

read the original abstract

Medical time series, such as electrocardiograms (ECG) and electroencephalograms (EEG), exhibit complex temporal dynamics and structured cross-channel dependencies, posing fundamental challenges for automated analysis. Conventional convolutional and recurrent models struggle to capture long-range dependencies, while Transformer-based approaches incur quadratic complexity and often introduce redundant interactions that are misaligned with the intrinsic structure of physiological signals. To address these limitations, we propose MedMamba, a principle-driven multi-scale bidirectional state space architecture tailored for medical time series classification. Our design is guided by three key inductive biases of physiological signals: spatial centralization, multi-timescale temporal composition, and non-causal contextual dependency. These principles are instantiated through a lightweight channel-mixing module for cross-channel reparameterization, multi-scale convolutional tokenization for temporal decomposition, and bidirectional Mamba blocks for efficient global context modeling with linear complexity. Extensive experiments on six benchmark datasets spanning EEG, ECG, and human activity signals demonstrate that MedMamba consistently outperforms state-of-the-art methods across diverse modalities. Notably, it achieves 85.97% accuracy on PTB and establishes new state-of-the-art performance on the challenging ADFTD dataset (54.72% accuracy and 52.01% F1-score). Strong results on long-sequence benchmarks, such as SleepEDF, further validate its capability in modeling long-range dependencies. Moreover, MedMamba achieves a speedup of 4.6x in inference, highlighting its practicality for real-time clinical deployment. These results suggest that principle-guided state space modeling offers an effective and scalable alternative to Transformer-based approaches for medical time series analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes MedMamba, a principle-driven multi-scale bidirectional state space model for medical time series classification. Guided by three inductive biases of physiological signals (spatial centralization, multi-timescale temporal composition, non-causal contextual dependency), it uses a lightweight channel-mixing module, multi-scale convolutional tokenization, and bidirectional Mamba blocks to achieve linear complexity. Experiments across six datasets (EEG, ECG, activity signals) report consistent outperformance of SOTA methods, including 85.97% accuracy on PTB, new SOTA on ADFTD (54.72% accuracy, 52.01% F1), strong long-sequence results on SleepEDF, and 4.6x inference speedup.

Significance. If the performance claims hold under rigorous validation, MedMamba offers a scalable, efficient alternative to quadratic-complexity Transformers for medical time series, with particular value for long-range dependency modeling in clinical settings. The explicit mapping of domain inductive biases to architectural components is a conceptual strength, and the reported efficiency gains support potential real-time deployment.

major comments (3)

[Experimental Results] Experimental section: reported accuracies (e.g., 85.97% on PTB, 54.72% on ADFTD) and the 4.6x speedup lack error bars, standard deviations across runs, or statistical significance tests against baselines, undermining confidence that observed gains are robust rather than attributable to random variation or tuning.
[Ablation Studies] Ablation studies: no experiments isolate the individual contributions of the channel-mixing module, multi-scale convolutional tokenization, and bidirectional Mamba blocks, which is required to substantiate that these components correctly instantiate the three stated inductive biases and explain the performance differences versus baselines.
[Experimental Results] Baseline details: full information on baseline implementations, hyperparameter search protocols, and whether all models were trained/evaluated under identical conditions and data splits is missing, which is load-bearing for the central claim of establishing new state-of-the-art results.

minor comments (2)

[Abstract] The abstract lists performance highlights but does not name all six datasets; explicit enumeration would aid readability.
[Method] Notation for the bidirectional Mamba blocks and multi-scale tokenization could be clarified with a single diagram or pseudocode equation to make the architecture more immediately accessible.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are identifiable. The three inductive biases are presented as domain assumptions rather than derived results.

pith-pipeline@v0.9.0 · 5611 in / 1151 out tokens · 23043 ms · 2026-05-10T08:53:44.512252+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 7 canonical work pages · 5 internal anchors

[1]

Automatic diagnosis of the 12-lead ecg using a deep neural network,

A. H. Ribeiro, M. H. Ribeiro, G. M. Paix ˜ao, D. M. Oliveira, P. R. Gomes, J. A. Canazart, M. P. Ferreira, C. R. Andersson, P. W. Macfarlane, W. Meira Jret al., “Automatic diagnosis of the 12-lead ecg using a deep neural network,”Nature communications, vol. 11, no. 1, p. 1760, 2020

2020
[2]

A comprehensive benchmark for electrocardiogram time-series,

Z. Tang, J. Qi, Y . Zheng, and J. Huang, “A comprehensive benchmark for electrocardiogram time-series,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 6490–6499

2025
[3]

Deep learning for elec- troencephalogram (eeg) classification tasks: a review,

A. Craik, Y . He, and J. L. Contreras-Vidal, “Deep learning for elec- troencephalogram (eeg) classification tasks: a review,”Journal of neural engineering, vol. 16, no. 3, p. 031001, 2019

2019
[4]

Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,

V . J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,”Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018

2018
[5]

Eeg-emg faconformer: Frequency aware conv-transformer for the fusion of eeg and emg,

Z. He, M. Cai, L. Li, S. Tian, and R.-J. Dai, “Eeg-emg faconformer: Frequency aware conv-transformer for the fusion of eeg and emg,” in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2024, pp. 3258–3261

2024
[6]

Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,

A. Y . Hannun, P. Rajpurkar, M. Haghpanahi, G. H. Tison, C. Bourn, M. P. Turakhia, and A. Y . Ng, “Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,”Nature medicine, vol. 25, no. 1, pp. 65–69, 2019

2019
[7]

A deep convolutional neural network model to classify heartbeats,

U. R. Acharya, S. L. Oh, Y . Hagiwara, J. H. Tan, M. Adam, A. Gertych, and R. San Tan, “A deep convolutional neural network model to classify heartbeats,”Computers in biology and medicine, vol. 89, pp. 389–396, 2017

2017
[8]

Real-time patient-specific ecg classification by 1-d convolutional neural networks,

S. Kiranyaz, T. Ince, and M. Gabbouj, “Real-time patient-specific ecg classification by 1-d convolutional neural networks,”IEEE transactions on biomedical engineering, vol. 63, no. 3, pp. 664–675, 2015

2015
[9]

Learning to diagnose with lstm recurrent neural networks,

Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzell, “Learning to diagnose with lstm recurrent neural networks,”arXiv preprint arXiv:1511.03677, 2016

work page arXiv 2016
[10]

Medformer: A multi-granularity patching transformer for medical time-series classifi- cation,

Y . Wang, N. Huang, T. Li, Y . Yan, and X. Zhang, “Medformer: A multi-granularity patching transformer for medical time-series classifi- cation,”Advances in Neural Information Processing Systems, vol. 37, pp. 36 314–36 341, 2024

2024
[11]

Informer: Beyond efficient transformer for long sequence time-series forecasting,

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI conference on artificial intel- ligence, vol. 35, no. 12, 2021, pp. 11 106–11 115

2021
[12]

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itrans- former: Inverted transformers are effective for time series forecasting,” arXiv preprint arXiv:2310.06625, 2023

work page internal anchor Pith review arXiv 2023
[13]

Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” Advances in neural information processing systems, vol. 34, pp. 22 419– 22 430, 2021

2021
[14]

Efficiently Modeling Long Sequences with Structured State Spaces

A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,”arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review arXiv 2021
[15]

Mamba: Linear-time sequence modeling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inFirst conference on language modeling, 2024

2024
[16]

A review of deep learning methods for irregularly sampled medical time series data,

C. Sun, M. Song, D. Cai, B. Zhang, H. Li, and S. Hong, “A review of deep learning methods for irregularly sampled medical time series data,” Health Data Science, 2020

2020
[17]

Is mamba effective for time series forecasting?

Z. Wang, F. Kong, S. Feng, M. Wang, X. Yang, H. Zhao, D. Wang, and Y . Zhang, “Is mamba effective for time series forecasting?”Neurocom- puting, vol. 619, p. 129178, 2025

2025
[18]

Gcmnet: A global context mamba network for long-term time series forecasting,

X. Liu, J. Ren, H. Zhang, and E. Zhang, “Gcmnet: A global context mamba network for long-term time series forecasting,”Pattern Recog- nition, p. 113287, 2026

2026
[19]

Ecg-mamba: Cardiac abnormality classification with non-uniform-mix augmentation on 12-lead ecgs,

H. Jiang, H. Mutahira, S. Wei, and M. S. Muhammad, “Ecg-mamba: Cardiac abnormality classification with non-uniform-mix augmentation on 12-lead ecgs,”IEEE Journal of Translational Engineering in Health and Medicine, 2025

2025
[20]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

SGDR: Stochastic Gradient Descent with Warm Restarts

——, “Sgdr: Stochastic gradient descent with warm restarts,”arXiv preprint arXiv:1608.03983, 2016

work page Pith review arXiv 2016
[22]

Analysis of electroencephalograms in alzheimer’s disease patients with multiscale entropy,

J. Escudero, D. Ab ´asolo, R. Hornero, P. Espino, and M. L´opez, “Analysis of electroencephalograms in alzheimer’s disease patients with multiscale entropy,”Physiological measurement, vol. 27, no. 11, pp. 1091–1106, 2006

2006
[23]

Dice-net: a novel convolution-transformer architecture for alzheimer detection in eeg signals,

A. Miltiadous, E. Gionanidis, K. D. Tzimourta, N. Giannakeas, and A. T. Tzallas, “Dice-net: a novel convolution-transformer architecture for alzheimer detection in eeg signals,”IEEe Access, vol. 11, pp. 71 840– 71 858, 2023

2023
[24]

Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,

A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,”circulation, vol. 101, no. 23, pp. e215–e220, 2000

2000
[25]

Ptb-xl, a large publicly available electro- cardiography dataset,

P. Wagner, N. Strodthoff, R.-D. Bousseljot, D. Kreiseler, F. I. Lunze, W. Samek, and T. Schaeffter, “Ptb-xl, a large publicly available electro- cardiography dataset,”Scientific data, vol. 7, no. 1, p. 154, 2020

2020
[26]

A pub- lic domain dataset for human activity recognition using smartphones

D. Anguita, A. Ghio, L. Oneto, X. Parra, J. L. Reyes-Ortizet al., “A pub- lic domain dataset for human activity recognition using smartphones.” inEsann, vol. 3, no. 1, 2013, pp. 3–4

2013
[27]

Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,

Y . Zhang and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting,” inThe eleventh international conference on learning representations, 2023

2023
[28]

Fedformer: Frequency enhanced decomposed transformer for long-term series fore- casting,

T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: Frequency enhanced decomposed transformer for long-term series fore- casting,” inInternational conference on machine learning. PMLR, 2022, pp. 27 268–27 286

2022
[29]

Multi-resolution time-series transformer for long-term forecasting,

Y . Zhang, L. Ma, S. Pal, Y . Zhang, and M. Coates, “Multi-resolution time-series transformer for long-term forecasting,” inInternational con- ference on artificial intelligence and statistics. PMLR, 2024, pp. 4222– 4230

2024
[30]

Non-stationary transformers: Exploring the stationarity in time series forecasting,

Y . Liu, H. Wu, J. Wang, and M. Long, “Non-stationary transformers: Exploring the stationarity in time series forecasting,”Advances in neural information processing systems, vol. 35, pp. 9881–9893, 2022

2022
[31]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,”arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review arXiv 2022
[32]

Reformer: The Efficient Transformer

N. Kitaev, Ł. Kaiser, and A. Levskaya, “Reformer: The efficient trans- former,”arXiv preprint arXiv:2001.04451, 2020

work page internal anchor Pith review arXiv 2001
[33]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017
[34]

Visualizing data using t-sne

L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”Journal of machine learning research, vol. 9, no. 11, 2008

2008