Mamba-based Deep Learning Approach for Sleep Staging on a Wireless Multimodal Wearable System without Electroencephalography
Pith reviewed 2026-05-23 07:05 UTC · model grok-4.3
The pith
A Mamba-based model infers wake, NREM, and REM sleep stages from chest ECG, motion, temperature, and finger PPG signals without EEG.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A Mamba-based recurrent neural network trained on ECG, triaxial accelerometry, chest and finger temperature, and photoplethysmography from the ANNE One system attains balanced accuracies of 84.02% (3-class), 75.30% (4-class), and 65.11% (5-class) when evaluated against manually scored polysomnography in 357 adults attending a tertiary care sleep clinic.
What carries the argument
Mamba-based recurrent neural network architecture that processes aligned sequences of chest ECG, accelerometry, temperature, and finger PPG to output sleep-stage probabilities.
If this is right
- Sleep staging becomes feasible with a non-intrusive wireless wearable that omits EEG electrodes.
- The same model architecture and training procedure apply directly to adults at tertiary sleep clinics.
- Ensembling multiple Mamba variants with similar architectures raises the reported balanced accuracy, F1, kappa, and MCC values.
- The approach works on automatically aligned multimodal signals that include both chest and finger sensors.
Where Pith is reading between the lines
- The same signals could support longitudinal tracking of sleep architecture outside a single-night lab visit.
- Accuracy on the 5-class task suggests the model captures some distinction between N1 and N2 even without EEG.
- If the alignment step generalizes, the method could be applied to datasets that lack simultaneous PSG.
Load-bearing premise
Manual PSG scoring after automatic ECG alignment supplies accurate ground-truth labels that the wearable signals can be mapped onto without systematic mismatch or label noise.
What would settle it
A new cohort of recordings from the same wearable system, scored by independent raters without ECG alignment to PSG, that produces accuracies below 70% for the 3-class task would falsify the reported performance.
read the original abstract
Study Objectives: We investigate a Mamba-based deep learning approach for sleep staging on signals from ANNE One (Sibel Health, Evanston, IL), a non-intrusive dual-module wireless wearable system measuring chest electrocardiography (ECG), triaxial accelerometry, and chest temperature, and finger photoplethysmography and finger temperature. Methods: We obtained wearable sensor recordings from 357 adults undergoing concurrent polysomnography (PSG) at a tertiary care sleep lab. Each PSG recording was manually scored and these annotations served as ground truth labels for training and evaluation of our models. PSG and wearable sensor data were automatically aligned using their ECG channels with manual confirmation by visual inspection. We trained a Mamba-based recurrent neural network architecture on these recordings. Ensembling of model variants with similar architectures was performed. Results: After ensembling, the model attains a 3-class (wake, non rapid eye movement [NREM] sleep, rapid eye movement [REM] sleep) balanced accuracy of 84.02%, F1 score of 84.23%, Cohen's $\kappa$ of 72.89%, and a Matthews correlation coefficient (MCC) score of 73.00%; a 4-class (wake, light NREM [N1/N2], deep NREM [N3], REM) balanced accuracy of 75.30%, F1 score of 74.10%, Cohen's $\kappa$ of 61.51%, and MCC score of 61.95%; a 5-class (wake, N1, N2, N3, REM) balanced accuracy of 65.11%, F1 score of 66.15%, Cohen's $\kappa$ of 53.23%, MCC score of 54.38%. Conclusions: Our Mamba-based deep learning model can successfully infer major sleep stages from the ANNE One, a wearable system without electroencephalography (EEG), and can be applied to data from adults attending a tertiary care sleep clinic.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a Mamba-based recurrent neural network for 3-, 4-, and 5-class sleep staging from multimodal signals (chest ECG, triaxial accelerometry, chest temperature, finger PPG, finger temperature) acquired by the ANNE One wearable. Recordings from 357 adults undergoing concurrent PSG at a tertiary sleep clinic serve as the dataset; PSG-derived labels after automatic ECG alignment with manual visual confirmation are used as ground truth. After ensembling model variants, the reported metrics are 84.02% balanced accuracy / 72.89% κ (3-class), 75.30% / 61.51% (4-class), and 65.11% / 53.23% (5-class).
Significance. If the performance numbers reflect genuine generalization, the work would demonstrate that a non-EEG wearable can achieve clinically usable sleep staging in a real-world patient population, extending the reach of quantitative sleep assessment beyond laboratory PSG. The choice of the Mamba architecture for long physiological sequences is technically relevant and could be of interest to the wearable-physiology community.
major comments (3)
- [Methods] Methods (alignment procedure): Automatic ECG-based alignment of PSG and wearable recordings is followed only by 'manual confirmation by visual inspection,' yet no quantitative metrics (mean residual offset, standard deviation, or confirmation reliability) are supplied. Because even 5–10 s misalignments can shift N1/N2 or REM boundaries, this directly undermines the validity of the ground-truth labels used to compute the reported 5-class balanced accuracy (65.11 %) and κ (53.23 %).
- [Methods] Methods (data partitioning and evaluation): The manuscript supplies no description of train–test splits, cross-validation scheme, subject-wise partitioning, or handling of class imbalance and demographic balance. Without these details the ensembled performance figures cannot be interpreted as evidence of generalization rather than in-sample fit.
- [Results] Results (ensembling): Performance is reported only after ensembling, but neither the individual model accuracies nor the ensembling procedure (e.g., probability averaging, number of variants) are described, preventing assessment of whether the Mamba architecture itself, rather than the ensemble, drives the gains.
minor comments (2)
- [Abstract] Abstract: The phrase 'major sleep stages' is used in the conclusions while 5-class results are presented; a brief clarification of terminology would improve readability.
- [Throughout] Notation: Ensure consistent expansion of abbreviations (NREM, REM) on first use in every section.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on the alignment procedure, data partitioning, and ensembling details. We address each point below and will revise the manuscript accordingly to improve clarity and transparency.
read point-by-point responses
-
Referee: [Methods] Methods (alignment procedure): Automatic ECG-based alignment of PSG and wearable recordings is followed only by 'manual confirmation by visual inspection,' yet no quantitative metrics (mean residual offset, standard deviation, or confirmation reliability) are supplied. Because even 5–10 s misalignments can shift N1/N2 or REM boundaries, this directly undermines the validity of the ground-truth labels used to compute the reported 5-class balanced accuracy (65.11 %) and κ (53.23 %).
Authors: We agree that quantitative metrics on alignment would strengthen the description of ground-truth validity. In the revised manuscript we will report the mean residual offset and standard deviation obtained from the ECG cross-correlation step, along with the number of recordings requiring manual adjustment during visual inspection. The alignment procedure used high-resolution ECG signals to minimize offsets, but we acknowledge that explicit metrics are needed to fully address concerns about boundary shifts in sleep stages. revision: yes
-
Referee: [Methods] Methods (data partitioning and evaluation): The manuscript supplies no description of train–test splits, cross-validation scheme, subject-wise partitioning, or handling of class imbalance and demographic balance. Without these details the ensembled performance figures cannot be interpreted as evidence of generalization rather than in-sample fit.
Authors: We concur that the current manuscript lacks these methodological details. The revised version will specify the subject-wise partitioning strategy (ensuring no data leakage across subjects), the train-test split ratios or cross-validation scheme employed, the approach to class imbalance (e.g., weighted loss or oversampling), and any measures taken to maintain demographic balance across partitions. revision: yes
-
Referee: [Results] Results (ensembling): Performance is reported only after ensembling, but neither the individual model accuracies nor the ensembling procedure (e.g., probability averaging, number of variants) are described, preventing assessment of whether the Mamba architecture itself, rather than the ensemble, drives the gains.
Authors: We will expand both the methods and results sections to describe the ensembling procedure, including the number of model variants, the sources of variation among them (e.g., random seeds or hyperparameter differences), and the aggregation method (probability averaging). We will also report performance metrics for the individual models to allow readers to evaluate the contribution of the Mamba architecture separately from the ensemble. revision: yes
Circularity Check
No circularity: empirical ML evaluation against independent PSG labels
full rationale
The paper trains and evaluates a Mamba-based neural network on wearable signals with ground-truth sleep stage labels obtained from separate manual PSG scoring. Reported metrics (balanced accuracy, F1, kappa, MCC) are computed directly against these external annotations on held-out recordings. No equations, fitted parameters, or self-citations reduce any claimed result to a quantity defined by the model itself. Alignment is described as a preprocessing step but introduces no definitional loop. The pipeline is standard supervised learning and remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Mamba architecture and training hyperparameters
axioms (1)
- domain assumption Manual scoring of concurrent PSG recordings constitutes accurate ground-truth sleep stage labels after ECG alignment
Reference graph
Works this paper leans on
-
[1]
Alex Iranzo and Joan Santamaria. Sleep in neurodegenerative diseases.Sleep Medicine: A Comprehensive Guide to Its Development, Clinical Milestones, and Advances in Treatment, pages 271–283, 2015
work page 2015
-
[2]
Richard B. Berry, Charlene E. Gamaldo, and Susan M. Harding et al.The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications, Version 2.2. American Academy of Sleep Medicine, 2015
work page 2015
-
[3]
A Rechtschaffen and A Kales. A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects, 1968
work page 1968
-
[4]
A systematic review of sensing technologies for wearable sleep staging.Sensors, 21(5):1562, 2021
Syed Anas Imtiaz. A systematic review of sensing technologies for wearable sleep staging.Sensors, 21(5):1562, 2021
work page 2021
-
[5]
Automatic Sleep Stage Scoring with Single-Channel EEG Using Convolutional Neural Networks
Orestis Tsinalis, Paul M Matthews, Yike Guo, and Stefanos Zafeiriou. Automatic sleep stage scoring with single-channel eeg using convolutional neural networks.arXiv preprint arXiv:1610.01683, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[6]
Alexander Malafeev, Dmitry Laptev, Stefan Bauer, Ximena Omlin, Aleksandra Wierzbicka, Adam Wichniak, Wojciech Jernajczyk, Robert Riener, Joachim Buhmann, and Peter Achermann. Auto- matic human sleep stage scoring using deep neural networks.Frontiers in neuroscience, 12:781, 2018
work page 2018
-
[7]
Hogeon Seo, Seunghyeok Back, Seongju Lee, Deokhwan Park, Tae Kim, and Kyoobin Lee. Intra-and inter-epoch temporal context network (iitnet) using sub-epoch features for automatic sleep scoring on raw single-channel eeg.Biomedical signal processing and control, 61:102037, 2020
work page 2020
-
[8]
Sleeptransformer: Automatic sleep staging with interpretability and uncertainty quantification
Huy Phan, Kaare Mikkelsen, Oliver Y Chén, Philipp Koch, Alfred Mertins, and Maarten De Vos. Sleeptransformer: Automatic sleep staging with interpretability and uncertainty quantification. IEEE Transactions on Biomedical Engineering, 69(8):2456–2467, 2022
work page 2022
-
[9]
Graphsleepnet: Adaptive spatial-temporal graph convolutional networks for sleep stage classification
Ziyu Jia, Youfang Lin, Jing Wang, Ronghao Zhou, Xiaojun Ning, Yuanlai He, and Yaoshuai Zhao. Graphsleepnet: Adaptive spatial-temporal graph convolutional networks for sleep stage classification. InIjcai, volume 2021, pages 1324–1330, 2020
work page 2021
-
[10]
Automated sleep scoring: A review of the latest approaches.Sleep medicine reviews, 48:101204, 2019
Luigi Fiorillo, Alessandro Puiatti, Michela Papandrea, Pietro-Luca Ratti, Paolo Favaro, Corinne Roth, Panagiotis Bargiotas, Claudio L Bassetti, and Francesca D Faraci. Automated sleep scoring: A review of the latest approaches.Sleep medicine reviews, 48:101204, 2019
work page 2019
-
[11]
Huy Phan and Kaare Mikkelsen. Automatic sleep staging of eeg signals: recent development, challenges, and future directions.Physiological Measurement, 43(4):04TR01, 2022
work page 2022
-
[12]
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural Computation, 9(8): 1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735
-
[13]
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Kyunghyun Cho. Learning phrase representations using rnn encoder-decoder for statistical machine translation.arXiv preprint arXiv:1406.1078, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[14]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[15]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
Combining recurrent, convolutional, and continuous-time models with linear state space layers
Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in neural information processing systems, 34:572–585, 2021
work page 2021
-
[17]
Chao Zhang, Weirong Cui, and Jingjing Guo. Mssc-bimamba: Multimodal sleep stage classification and early diagnosis of sleep disorders with bidirectional mamba.arXiv preprint arXiv:2405.20142, 2024
-
[18]
Xinliang Zhou, Yuzhe Han, Chenyu Liu, Yi Ding, Ziyu Jia, and Yang Liu. Bit-mamsleep: Bidirec- tional temporal mamba for eeg sleep staging.arXiv preprint arXiv:2411.01589, 2024
-
[19]
Sibel Health. Clinical Care | ANNE One. URLhttps://sibelhealth.com/clinical-care/
-
[20]
Pin-Wei Chen, Megan K O’Brien, Adam P Horin, Lori L McGee Koch, Jong Yoon Lee, Shuai Xu, Phyllis C Zee, Vineet M Arora, and Arun Jayaraman. Sleep monitoring during acute stroke rehabilitation: Toward automated measurement using multimodal wireless sensors.Sensors, 22(16): 6190, 2022. 18
work page 2022
-
[21]
Mustafa Radha, Pedro Fonseca, Arnaud Moreau, Marco Ross, Andreas Cerny, Peter Anderer, Xi Long, and Ronald M Aarts. Sleep stage classification from heart-rate variability using long short-term memory neural networks.Scientific reports, 9(1):14149, 2019
work page 2019
-
[22]
Sleep staging from electrocardiography and respiration with deep learning.Sleep, 43(7):zsz306, 2020
Haoqi Sun, Wolfgang Ganglberger, Ezhil Panneerselvam, Michael J Leone, Syed A Quadri, Balaji Goparaju, Ryan A Tesh, Oluwaseun Akeju, Robert J Thomas, and M Brandon Westover. Sleep staging from electrocardiography and respiration with deep learning.Sleep, 43(7):zsz306, 2020
work page 2020
-
[23]
Kevin Kotzen, Peter H Charlton, Sharon Salabi, Lea Amar, Amir Landesberg, and Joachim A Behar. Sleepppg-net: A deep learning algorithm for robust sleep staging from continuous photo- plethysmography.IEEE Journal of Biomedical and Health Informatics, 27(2):924–932, 2022
work page 2022
-
[24]
Nicolò Pini, Ju Lynn Ong, Gizem Yilmaz, Nicholas IYN Chee, Zhao Siting, Animesh Awasthi, Siddharth Biju, Kishan Kishan, Amiya Patanaik, William P Fifer, et al. An automated heart rate-based algorithm for sleep stage classification: Validation using conventional polysomnography and an innovative wearable electrocardiogram device.Frontiers in Neuroscience, ...
work page 2022
-
[25]
Pavlos I Topalidis, Sebastian Baron, Dominik PJ Heib, Esther-Sevil Eigl, Alexandra Hinterberger, and Manuel Schabus. From pulses to sleep stages: Towards optimized sleep classification using heart-rate variability.Sensors, 23(22):9077, 2023
work page 2023
-
[26]
Adam M Jones, Laurent Itti, and Bhavin R Sheth. Expert-level sleep staging using an electrocardiography-only feed-forward neural network.Computers in Biology and Medicine, 176: 108545, 2024
work page 2024
-
[27]
Kianoosh Kazemi, Arash Abiri, Yongxiao Zhou, Amir Rahmani, Rami N Khayat, Pasi Liljeberg, and Michelle Khine. Improved sleep stage predictions by deep learning of photoplethysmogram and respiration patterns.Computers in Biology and Medicine, 179:108679, 2024
work page 2024
-
[28]
Sleep staging algorithm based on smartwatch sensors for healthy and sleep apnea populations
Fernanda B Silva, Luisa FS Uribe, Felipe X Cepeda, Vitor FS Alquati, João PS Guimarães, Yuri GA Silva, Orlem L Dos Santos, Alberto A de Oliveira, Gabriel HM de Aguiar, Monica L Andersen, et al. Sleep staging algorithm based on smartwatch sensors for healthy and sleep apnea populations. Sleep Medicine, 119:535–548, 2024
work page 2024
-
[29]
Xin Zhang, Weixuan Kou, I Eric, Chao Chang, He Gao, Yubo Fan, and Yan Xu. Sleep stage classification based on multi-level feature learning and recurrent neural networks via wearable device. Computers in biology and medicine, 103:71–81, 2018
work page 2018
-
[30]
hdbscan: Hierarchical density based clustering.J
Leland McInnes, John Healy, Steve Astels, et al. hdbscan: Hierarchical density based clustering.J. Open Source Softw., 2(11):205, 2017
work page 2017
-
[31]
mamba.py: A simple, hackable and efficient mamba implementation in pure pytorch and mlx., 2024
Alexandre Torres-Leguet. mamba.py: A simple, hackable and efficient mamba implementation in pure pytorch and mlx., 2024. URLhttps://github.com/alxndrTL/mamba.py
work page 2024
-
[32]
Diederik P. Kingma and Jimmy Lei Ba. Adam: A method for stochastic optimization. In3rd International Conference for Learning Representations, January 2017
work page 2017
-
[33]
A desicion-theoretic generalization of on-line learning and an application to boosting
Yoav Freund and Robert E Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. InEuropean conference on computational learning theory, pages 23–37. Springer, 1995
work page 1995
-
[34]
Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research...
work page 2011
-
[35]
Tin Kam Ho. The random subspace method for constructing decision forests.IEEE transactions on pattern analysis and machine intelligence, 20(8):832–844, 1998
work page 1998
-
[36]
Extremely randomized trees.Machine learning, 63:3–42, 2006
Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized trees.Machine learning, 63:3–42, 2006
work page 2006
-
[37]
Gbht: Gradient boosting histogram transform for density estimation
Jingyi Cui, Hanyuan Hang, Yisen Wang, and Zhouchen Lin. Gbht: Gradient boosting histogram transform for density estimation. InInternational Conference on Machine Learning, pages 2233–2243. PMLR, 2021
work page 2021
-
[38]
Xgboost: A scalable tree boosting system
Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016. 19
work page 2016
-
[39]
URLhttps://www.frontiersin.org/articles /10.3389/frsle.2023.1163477
Peter Anderer, Marco Ross, Andreas Cerny, Ray Vasko, Edmund Shaw, and Pedro Fonesca. Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing.Frontiers in Sleep, 2, 2023. doi: https://doi.org/10.3389/frsle.2023.1163477
-
[40]
Umap: Uniform manifold approximation and projection for dimension reduction.J
Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.J. Open Source Softw., 3(29):861, 2018
work page 2018
-
[41]
Axiomatic attribution for deep networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017
work page 2017
-
[42]
Andrew Zhang, Chunlin Li, Yuzhi Tang, Alex He-Mo, Nasim Montazeri Ghahjaverestan, Maged Goubran, and Andrew Lim. 1122 a deep learning model for inferring sleep stage from a flexible wireless dual sensor wearable system without eeg.Sleep, 47(Supplement_1):A481–A482, 2024
work page 2024
-
[43]
Wulterkens, Pedro Fonseca, Lieke W.A
Bernice M. Wulterkens, Pedro Fonseca, Lieke W.A. Hermans, Marco Ross, Andreas Cerny, Peter Anderer, Xi Long, Johannes P. van Dijk, Nele Vandenbussche, Sigrid Pillen, Merel M. van Gilst, and Sebastiaan Overeem. It is all in the wrist: Wearable sleep staging in a clinical population versus reference polysomnography.Nature and Science of Sleep, 13:885–897, 2...
work page 2021
-
[44]
Fons Schipper, Angela Grassi, Marco Ross, Andreas Cerny, Peter Anderer, Lieke Hermans, Fokke van Meulen, Mickey Leentjens, Emily Schoustra, Pien Bosschieter, Ruud J. G. van Sloun, Sebastiaan Overeem, and Pedro Fonseca. Overnight sleep staging using chest-worn accelerometry.Sensors, 24 (17), 2024. ISSN 1424-8220. doi: 10.3390/s24175717. URLhttps://www.mdpi...
-
[45]
Richard S. Rosenberg and Steven Van Hout. The american academy of sleep medicine inter-scorer reliability program: Sleep stage scoring.Journal of Clinical Sleep Medicine, 9(1), 2013. doi: doi.org/10.5664/jcsm.2350. 20 S Supplementary Materials S.1 Mamba Architecture S.1.1 Selective SSM At the core of the Mamba architecture is the state space model (SSM). ...
-
[46]
A convolutional component, consisting of four 1D residual blocks (see S5, right) which continually extract features and downsize the window size axis, flattening it into an embedding vector per each 30-second window
-
[47]
A recurrent component, consisting of a bidirectional (same as RNN model, S.1) three-layer Mamba block that processes the embedding vectors sequentially, followed by an element-wise multi-layer perceptron (MLP) that diminishes the feature-size axis into logits for each class. Figure S5. Overview and implementation details of the CRNN architecture. Each 1D ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.