pith. sign in

arxiv: 2605.15719 · v1 · pith:6G7HNJSFnew · submitted 2026-05-15 · 💻 cs.ET

Lightweight Cross-Device Sleep Tracking on the WeBe Wearable Platform

Pith reviewed 2026-05-19 18:15 UTC · model grok-4.3

classification 💻 cs.ET
keywords sleep trackingaccelerometer datawearable platformlightweight pipelinecross-device evaluationtotal sleep timeactivity featuresthreshold classification
0
0 comments X

The pith

A simple pipeline on raw accelerometer signals tracks sleep across wearables with 27 to 42 minute error

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce a lightweight sleep tracking approach that uses only raw motion sensor readings instead of proprietary algorithms. Data is turned into activity scores per short time period, smoothed, normalized, and then separated into sleep and wake using one fixed cutoff value chosen from calibration data. Tested first on a public dataset and then on recordings from their WeBe device in everyday conditions with a few users, the method produces total sleep time estimates off by roughly 42 minutes on the public set and 27 minutes in real use. This matters because it offers a transparent way to monitor sleep that can run on many different gadgets without extra training or adjustments for each one.

Core claim

The central discovery is that converting raw accelerometer signals into epoch-level activity features, followed by temporal smoothing and normalized scoring, allows accurate sleep versus wake classification via a single globally calibrated threshold. On the MMASH dataset this yields a mean absolute error of 41.6 minutes in Total Sleep Time along with onset and offset errors of 6.3 and 7.4 minutes. On real-world data collected with the WeBe platform from three participants over five sessions the corresponding errors are 27.4, 13.9 and 8.0 minutes, outperforming a commercial ActiGraph pipeline relative to ground truth.

What carries the argument

Epoch-level activity features derived from raw accelerometer signals, processed by temporal smoothing, normalized scoring, and classification with a globally calibrated threshold.

If this is right

  • Open-source sleep tracking becomes feasible without relying on closed commercial algorithms.
  • Consistent performance across different wearable hardware reduces the need for per-device model retraining.
  • Low computational demands support deployment on battery-constrained devices for continuous monitoring.
  • Baseline errors provide a reference point for improving or comparing future sleep analysis methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the same scoring approach to other physiological signals could broaden its use in health wearables.
  • Validating on larger cohorts would strengthen claims of generalizability to diverse populations.
  • Real-time implementation on the device itself could enable immediate feedback on sleep quality.

Load-bearing premise

Normalizing the activity scores allows a single threshold to work reliably for sleep and wake detection no matter the specific wearable device or the user's daily routine.

What would settle it

A new experiment using a different wearable sensor type and a larger group of participants where the mean absolute error in total sleep time exceeds 60 minutes would indicate that the global threshold does not generalize as claimed.

Figures

Figures reproduced from arXiv: 2605.15719 by Ehsan Kourkchi, Houman Homayoun, Krishi Prashant Shah, Setareh Rafatirad, Wei Shao, Zequan Liang.

Figure 1
Figure 1. Figure 1: Overview of the proposed sleep tracking pipeline. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Wearable devices used in this study: (a) commercial [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example sleep score trajectory for a MMASH user. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Threshold sensitivity on the MMASH dataset. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Wearable devices are widely used for continuous health monitoring, yet reliable sleep tracking on emerging platforms remains underexplored due to reliance on proprietary algorithms and device-specific activity representations. We present a lightweight and reproducible sleep tracking pipeline that operates directly on raw accelerometer signals. The method converts data into epoch-level activity features, applies temporal smoothing and normalized scoring, and performs sleep/wake classification using a globally calibrated threshold. We calibrate the model on the Multilevel Monitoring of Activity and Sleep in Healthy People (MMASH) dataset and evaluate it in a cross-device study using the WeBe wearable platform and a commercial ActiGraph device. On MMASH, the method achieves a mean absolute error of 41.6 minutes in Total Sleep Time (TST), with onset and offset errors of 6.3 and 7.4 minutes. On real-world WeBe data from three participants across five sessions, it achieves a mean TST error of 27.4 minutes and onset and offset errors of 13.9 and 8.0 minutes. In contrast, a commercial ActiGraph pipeline shows larger discrepancies relative to ground truth. These results demonstrate accurate and generalizable sleep tracking using a simple and reproducible pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a lightweight, reproducible pipeline for sleep/wake classification from raw accelerometer signals on wearable devices. Raw data are converted to epoch-level activity features, followed by temporal smoothing, normalized scoring, and binary classification via a single globally calibrated threshold. The threshold is fit on the MMASH dataset (yielding 41.6 min TST MAE, 6.3 min onset error, 7.4 min offset error) and evaluated on WeBe platform recordings from three participants across five sessions (27.4 min TST MAE, 13.9 min onset, 8.0 min offset), with comparisons showing smaller discrepancies than a commercial ActiGraph pipeline.

Significance. If the central results hold under larger-scale validation, the work offers a simple, device-agnostic alternative to proprietary sleep algorithms that could facilitate sleep monitoring on emerging or low-cost wearables. The concrete error metrics, explicit cross-device comparison, and emphasis on reproducibility constitute clear strengths that would support broader adoption if the generalizability concerns are addressed.

major comments (2)
  1. [Real-world WeBe evaluation] The headline claim of accurate and generalizable cross-device sleep tracking with a single globally calibrated threshold depends on the WeBe results (27.4 min TST MAE). However, these rest on data from only three participants across five sessions. Such limited N cannot establish robust transfer across hardware, populations, or real-world conditions without retraining; participant-specific movement or sleep patterns could inflate apparent performance. MMASH calibration does not compensate for the tiny real-world sample when asserting cross-device robustness.
  2. [Methods / Abstract] The abstract and methods provide no details on exact feature definitions, the normalization procedure, or validation against potential confounds such as varying sensor placement or participant demographics. Without these, it is difficult to assess whether the normalized scoring step embeds fitted parameters whose independence from the final performance numbers is guaranteed.
minor comments (2)
  1. [Abstract] The abstract could explicitly state the number of participants and sessions in the WeBe evaluation to allow readers to immediately contextualize the generalizability claims.
  2. [Discussion] Consider adding an explicit limitations paragraph that directly addresses the small real-world sample size and its implications for the cross-device claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped us improve the clarity and balance of the manuscript. We address each major comment below.

read point-by-point responses
  1. Referee: [Real-world WeBe evaluation] The headline claim of accurate and generalizable cross-device sleep tracking with a single globally calibrated threshold depends on the WeBe results (27.4 min TST MAE). However, these rest on data from only three participants across five sessions. Such limited N cannot establish robust transfer across hardware, populations, or real-world conditions without retraining; participant-specific movement or sleep patterns could inflate apparent performance. MMASH calibration does not compensate for the tiny real-world sample when asserting cross-device robustness.

    Authors: We agree that the small sample size (three participants, five sessions) in the WeBe evaluation is a genuine limitation for strong claims of generalizability. The WeBe recordings represent an initial real-world cross-device test rather than a large-scale validation study. In the revised manuscript we have added an explicit Limitations section that qualifies the cross-device results, notes the preliminary nature of the transfer demonstration, and states that larger cohorts will be needed to confirm robustness across populations and hardware variations. We retain the observation that the globally calibrated threshold (fit only on MMASH) was applied without retraining, but we no longer frame the WeBe numbers as definitive proof of broad generalizability. revision: yes

  2. Referee: [Methods / Abstract] The abstract and methods provide no details on exact feature definitions, the normalization procedure, or validation against potential confounds such as varying sensor placement or participant demographics. Without these, it is difficult to assess whether the normalized scoring step embeds fitted parameters whose independence from the final performance numbers is guaranteed.

    Authors: We have expanded the Methods section with precise definitions of the epoch-level features (vector magnitude per 30-second epoch, activity counts, and zero-crossing rate), the exact normalization formula (per-session z-score of the activity feature), and a new paragraph discussing potential confounds including wrist placement variability and the demographic characteristics of the MMASH and WeBe cohorts. The revised text explicitly states that no WeBe data were used in threshold calibration or normalization parameter fitting, confirming that the reported performance reflects transfer of a fixed, globally determined threshold. revision: yes

Circularity Check

0 steps flagged

No circularity: calibration on MMASH and independent evaluation on WeBe data

full rationale

The pipeline converts raw accelerometer signals to epoch-level features, applies temporal smoothing and normalized scoring, then classifies sleep/wake via a single globally calibrated threshold. The threshold is fitted on the MMASH dataset and the resulting model is evaluated on separate real-world WeBe sessions from three participants. No equations or steps reduce the reported TST MAE, onset/offset errors, or cross-device comparison to the calibration inputs by construction. The WeBe results constitute an out-of-sample test rather than a self-referential prediction, and no self-citation chain or ansatz smuggling is present in the derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The pipeline rests on standard assumptions from wearable signal processing; no new physical entities are introduced.

free parameters (1)
  • globally calibrated sleep/wake threshold
    Single threshold applied after normalized scoring; value obtained by calibration on MMASH dataset.
axioms (1)
  • domain assumption Raw accelerometer signals can be reliably converted into epoch-level activity features that distinguish sleep from wake.
    Core premise of the feature-extraction stage described in the abstract.

pith-pipeline@v0.9.0 · 5769 in / 1277 out tokens · 69714 ms · 2026-05-19T18:15:08.890349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Christine Acebo and Monique K LeBourgeois. 2006. Actigraphy.Respiratory care clinics of North America12, 1 (2006), 23–30

  2. [2]

    Ametris. 2026. ActiGraph LEAP | Ametris Wearable Devices. https://ametris. com/actigraph-leap. [Online; accessed May 2026]

  3. [3]

    Greg Atkinson and Damien Davenne. 2007. Relationships between sleep, physical activity and human health.Physiology & behavior90, 2-3 (2007), 229–235

  4. [4]

    Roger J Cole, Daniel F Kripke, William Gruen, Daniel J Mullaney, and J Christian Gillin. 1992. Automatic sleep/wake identification from wrist activity.Sleep15, 5 (1992), 461–469

  5. [5]

    Massimiliano De Zambotti, Nicola Cellini, Aimee Goldstone, Ian M Colrain, and Fiona C Baker. 2019. Wearable sleep technology in clinical and research settings. Medicine and science in sports and exercise51, 7 (2019), 1538

  6. [6]

    Ruijie Fang, Sally Hang, Ruoyu Zhang, Chongzhou Fang, Setareh Rafatirad, Camelia Hostinar, and Houman Homayoun. 2024. Validation of webe band during physical activities. In2024 IEEE 20th International Conference on Body Sensor Networks (BSN). IEEE, 1–4

  7. [7]

    Patty Freedson, David Pober, and Kathleen F Janz. 2005. Calibration of accelerom- eter output for children.Medicine & Science in Sports & Exercise37, 11 (2005), S523–S530

  8. [8]

    HealtheTile. 2026. We-Be Band – Healthetile. https://healthetile.io/product/we- be-band/. [Online; accessed May 2026]

  9. [9]

    Zequan Liang, Ruoyu Zhang, Wei Shao, Ehsan Kourkchi, Setareh Rafatirad, Houman Homayoun, et al. 2025. Rapid Adaptation of SpO2 Estimation to Wear- able Devices via Transfer Learning on Low-Sampling-Rate PPG.arXiv preprint arXiv:2509.12515(2025)

  10. [10]

    Zequan Liang, Ruoyu Zhang, Wei Shao, Mahdi Pirayesh Shirazi Nejad, Ehsan Kourkchi, Setareh Rafatirad, and Houman Homayoun. 2025. Generalizable Blood Pressure Estimation from Multi-Wavelength PPG Using Curriculum-Adversarial Learning. In2025 IEEE 21st International Conference on Body Sensor Networks (BSN). IEEE, 1–4

  11. [11]

    Miguel Marino, Yi Li, Michael N Rueschman, John W Winkelman, Jeffrey M Ellenbogen, Jo M Solet, Hilary Dulin, Lisa F Berkman, and Orfeu M Buxton

  12. [12]

    Measuring sleep: accuracy, sensitivity, and specificity of wrist actigraphy compared to polysomnography.Sleep36, 11 (2013), 1747–1755

  13. [13]

    Charles E Matthew. 2005. Calibration of accelerometer output for adults.Medicine & Science in Sports & Exercise37, 11 (2005), S512–S522

  14. [14]

    Alessio Rossi, Eleonora Da Pozzo, Dario Menicagli, Chiara Tremolanti, Corrado Priami, Alina Sirbu, David Clifton, Claudia Martini, and David Morelli. 2020. Multilevel Monitoring of Activity and Sleep in Healthy People.PhysioNet(June 2020). doi:10.13026/cerq-fc86 Version 1.0.0

  15. [15]

    Avi Sadeh, M Sharkey, and Mary A Carskadon. 1994. Activity-based sleep-wake identification: an empirical test of methodological issues.Sleep17, 3 (1994), 201–207

  16. [16]

    Wei Shao, Zequan Liang, Ruoyu Zhang, Ruijie Fang, Ning Miao, Ehsan Kourkchi, Setareh Rafatirad, Houman Homayoun, and Chongzhou Fang. 2025. Know me by my pulse: Toward practical continuous authentication on wearable devices via wrist-worn ppg.arXiv preprint arXiv:2508.13690(2025)

  17. [17]

    Wei Shao, Ruoyu Zhang, Zequan Liang, Ehsan Kourkchi, Setareh Rafatirad, and Houman Homayoun. 2025. Self-Supervised and Topological Signal-Quality As- sessment for Any PPG Device. In2025 IEEE 21st International Conference on Body Sensor Networks (BSN). IEEE, 1–4

  18. [18]

    Catrine Tudor-Locke, Tiago V Barreira, John M Schuna Jr, Emily F Mire, and Peter T Katzmarzyk. 2014. Fully automated waist-worn accelerometer algorithm for detecting children’s sleep-period time separate from 24-h physical activity or sedentary behaviors.Applied physiology, nutrition, and metabolism39, 1 (2014), 53–57

  19. [19]

    Ruoyu Zhang, Ruijie Fang, Mahdi Orooji, and Houman Homayoun. 2024. Intro- ducing we-be band: an end-to-end platform for continuous health monitoring. In2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 1–5