ActiNet: An Open-Source Tool for Activity Intensity Classification of Wrist-Worn Accelerometry Using Self-Supervised Deep Learning
Pith reviewed 2026-05-18 10:51 UTC · model grok-4.3
The pith
A self-supervised deep learning model called ActiNet classifies activity intensity from wrist accelerometers more accurately than random forest baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The ActiNet model, consisting of HARNet, a self-supervised 18-layer modified ResNet-V2, followed by hidden Markov model smoothing, distinguishes labels of activity intensity with a mean macro F1 score of 0.82 and a mean Cohen's kappa score of 0.86. This performance exceeds that of a random forest plus HMM baseline trained and validated on the same CAPTURE-24 dataset, which reached mean scores of 0.76 and 0.80. The improvements remain consistent across subgroups of age and sex.
What carries the argument
HARNet, an 18-layer modified ResNet-V2 model trained self-supervised on raw wrist accelerometer signals, which learns feature representations that support activity intensity classification before hidden Markov model smoothing.
If this is right
- The performance advantage of ActiNet holds across age and sex subgroups within the study population.
- Classification outputs from ActiNet produce different estimated daily activity intensity compositions than those from the random forest baseline.
- The open-source model enables direct extraction of activity intensity labels for future large-scale studies on physical activity and health.
- Pairing self-supervised learning with hidden Markov model smoothing yields measurable classification gains over prior methods.
Where Pith is reading between the lines
- Applying ActiNet to accelerometer records from other countries or age ranges would test whether the reported accuracy generalizes beyond the original cohort.
- More precise intensity labels could strengthen statistical associations when researchers examine links between specific activity patterns and health conditions.
- The open-source release allows other groups to process their own wrist data without repeating the full training process.
Load-bearing premise
The 151-participant CAPTURE-24 dataset supplies a representative sample whose activity patterns and device characteristics generalize to the diverse populations and hardware used in broader epidemiological studies.
What would settle it
Applying the trained ActiNet model to an independent wrist-accelerometer dataset from a different demographic group or device brand and checking whether the macro F1 score drops below the random forest baseline of 0.76.
read the original abstract
The use of accurate and reliable open-source human activity recognition (HAR) models on passively collected wrist-accelerometer data is essential in large-scale epidemiological studies that investigate the association between physical activity and health outcomes. While self-supervised learning has generated considerable excitement in improving HAR, the extent to which these models, coupled with hidden Markov models (HMMs), would make a tangible improvement to classification performance and the effect this may have on the predicted daily activity intensity compositions is unknown. Using up to 24 hours of wrist-worn accelerometer data from 151 CAPTURE-24 participants (aged 18 - 91, mean age 42, 66% female), we trained the ActiNet model, consisting HARNet, a self-supervised, 18-layer, modified ResNet-V2 model, followed by hidden Markov model (HMM) smoothing to classify labels of activity intensity. The performance of this model, evaluated using 5-fold stratified group cross-validation, was then compared to a baseline random forest (RF) + HMM, established in existing literature. Differences in performance and classification outputs were compared with different subgroups of age and sex within the CAPTURE-24 population. The ActiNet model was able to distinguish labels of activity intensity with a mean macro F1 score of 0.82 and a mean Cohen's kappa score of 0.86. This exceeded the performance of the RF + HMM, trained and validated on the same dataset, with mean scores of 0.76 and 0.80, respectively. The improvements in performance were consistent across subgroups of age and sex. These findings encourage the use of ActiNet for the extraction of activity intensity labels from wrist-accelerometer data in future epidemiological studies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ActiNet, consisting of HARNet (an 18-layer modified ResNet-V2 trained with self-supervised learning) followed by HMM smoothing, for classifying activity intensity from wrist-worn accelerometer data. Using 5-fold stratified group cross-validation on the CAPTURE-24 dataset (151 participants, ages 18-91), it reports mean macro F1 of 0.82 and Cohen's kappa of 0.86, outperforming a RF + HMM baseline (0.76 and 0.80). Improvements are consistent across age and sex subgroups within the cohort, and the authors recommend the open-source tool for extracting activity intensity labels in future epidemiological studies.
Significance. If the internal performance gains and subgroup consistency hold under broader testing, ActiNet would offer a useful open-source advance for HAR in epidemiology by leveraging self-supervised pre-training and temporal smoothing. The reproducible cross-validation protocol and direct comparison to an established baseline are strengths that support potential adoption, though the absence of external validation limits immediate claims of broad utility.
major comments (3)
- [Abstract and Discussion] Abstract and Discussion: The recommendation that ActiNet 'should be used' in future epidemiological studies rests on performance measured entirely within the CAPTURE-24 cohort via 5-fold stratified group CV. No external validation on independent datasets with differing demographics, body-mass distributions, or device calibrations is reported, so the claim that the model extracts population-invariant intensity labels is not yet supported.
- [Methods and Results] Methods and Results: Self-supervised pre-training of the 18-layer ResNet occurs on only ~151 person-days from the same CAPTURE-24 sample with no external corpus mentioned. This setup risks encoding cohort-specific movement patterns or label noise rather than device- and population-invariant features, directly affecting the generalizability asserted for broad epidemiological use.
- [Results] Results: While mean macro F1 and kappa are reported, the manuscript does not provide per-fold standard deviations, confidence intervals, or a full breakdown of confusion matrices for the intensity classes. This makes it difficult to assess whether the 0.06 F1 improvement over RF+HMM is robust or driven by particular activity categories.
minor comments (2)
- [Abstract] Abstract: Add one sentence summarizing label definitions (e.g., sedentary/light/moderate/vigorous thresholds) and key preprocessing steps so readers can immediately evaluate the reported metrics.
- [Methods] Methods: Explicitly state the exact modifications made to the ResNet-V2 architecture and the self-supervised pretext task used for pre-training.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment point by point below, with revisions made to the manuscript where feasible to strengthen the presentation and moderate claims.
read point-by-point responses
-
Referee: [Abstract and Discussion] Abstract and Discussion: The recommendation that ActiNet 'should be used' in future epidemiological studies rests on performance measured entirely within the CAPTURE-24 cohort via 5-fold stratified group CV. No external validation on independent datasets with differing demographics, body-mass distributions, or device calibrations is reported, so the claim that the model extracts population-invariant intensity labels is not yet supported.
Authors: We agree that external validation would be required to support stronger claims of population-invariant features. Our study employs 5-fold stratified group cross-validation specifically to evaluate performance while preventing participant-level leakage within the available cohort. We have revised the abstract and discussion to replace the phrasing that ActiNet 'should be used' with the more measured statement that it 'may offer a useful open-source approach' for extracting activity intensity labels, while explicitly calling for external validation on independent datasets in future work. revision: yes
-
Referee: [Methods and Results] Methods and Results: Self-supervised pre-training of the 18-layer ResNet occurs on only ~151 person-days from the same CAPTURE-24 sample with no external corpus mentioned. This setup risks encoding cohort-specific movement patterns or label noise rather than device- and population-invariant features, directly affecting the generalizability asserted for broad epidemiological use.
Authors: We acknowledge this as a genuine limitation of the current work. The self-supervised pre-training was performed on unlabeled wrist-accelerometer segments from the CAPTURE-24 cohort using standard augmentations to encourage learning of general temporal features rather than label-specific patterns. We have expanded the methods description of the pre-training procedure and added an explicit limitations paragraph in the discussion noting the risk of cohort-specific encoding and the value of future multi-cohort pre-training when larger external unlabeled corpora become available. revision: partial
-
Referee: [Results] Results: While mean macro F1 and kappa are reported, the manuscript does not provide per-fold standard deviations, confidence intervals, or a full breakdown of confusion matrices for the intensity classes. This makes it difficult to assess whether the 0.06 F1 improvement over RF+HMM is robust or driven by particular activity categories.
Authors: We appreciate this suggestion for improved reporting. We have added the per-fold standard deviations for macro F1 and Cohen's kappa, along with 95% confidence intervals obtained via bootstrap resampling across folds. We have also included the mean confusion matrix (averaged over the five folds) broken down by the three intensity classes, which shows that the performance gains are distributed across categories rather than concentrated in any single one. revision: yes
- Absence of external validation on independent datasets, which cannot be addressed without access to additional data sources beyond the current CAPTURE-24 cohort.
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper reports empirical performance of ActiNet (self-supervised 18-layer ResNet-V2 + HMM) versus RF+HMM baseline using 5-fold stratified group cross-validation on the CAPTURE-24 dataset. The macro F1 (0.82) and Cohen's kappa (0.86) scores are computed directly from held-out folds and compared on the same data splits; no equations or derivations reduce these metrics to definitions of the input features, labels, or fitted parameters by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked to justify the central claims. The evaluation is self-contained as standard supervised validation on empirical accelerometer data.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The activity intensity labels in the CAPTURE-24 dataset are accurate and consistent enough to serve as ground truth for supervised evaluation.
- domain assumption Wrist-worn accelerometer signals from the study population capture activity intensity in a manner generalizable to other epidemiological cohorts.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArrowOfTime.leanforward_accumulates, arrow_from_z echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
self-supervised learning to learn the arrow of time classification task on one day of unlabelled data from roughly 100,000 UK Biobank participants, to distinguish the difference between forward and reverse time accelerometer signals
-
IndisputableMonolith/Foundation/ArrowOfTime.leanTemporalSequence, zAtStep echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Hidden Markov models (HMMs) are used by both models to smooth the series of classifications made by successive 30-second windows
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Walmsley, R. et al. Reallocation of time between device-measured movement behaviours and risk of incident cardiovascular disease. Br J Sports Med bjsports-2021-104050 (2021) doi:10.1136/bjsports-2021-104050
-
[2]
Shreves, A. H. et al. Amount and intensity of daily total physical activity, step count and risk of incident cancer in the UK Biobank. (2025) doi:10.1136/bjsports-2024-109360
-
[3]
Borodulin, K. & Anderssen, S. Physical activity: associations with health and summary of guidelines. Food Nutr Res 67, 10.29219/fnr.v67.9719 (2023)
-
[4]
Janssen, I. & LeBlanc, A. G. Systematic review of the health benefits of physical activity and fitness in school-aged children and youth. International Journal of Behavioral Nutrition and Physical Activity 7, 40 (2010)
work page 2010
-
[5]
(World Health Organization, Geneva, 2020)
WHO Guidelines on Physical Activity and Sedentary Behaviour. (World Health Organization, Geneva, 2020)
work page 2020
-
[6]
(World Health Organization, Geneva, 2019)
Guidelines on Physical Activity, Sedentary Behaviour and Sleep for Children under 5 Years of Age. (World Health Organization, Geneva, 2019)
work page 2019
-
[7]
Full, K. M. et al. Validation of a Physical Activity Accelerometer Device Worn on the Hip and Wrist Against Polysomnography. Sleep Health 4, 209–216 (2018)
work page 2018
-
[8]
Quante, M. et al. Practical considerations in using accelerometers to assess physical activity, sedentary behavior, and sleep. Sleep Health 1, 275–284 (2015)
work page 2015
-
[9]
Doherty, A., Chan, S., Yuan, H. & Walmsley, R. accelerometer: A Python Toolkit for Extracting Physical Activity and Behavior Metrics from Wearable Sensor Data. https://doi.org/10.5281/zenodo.14515076 (2020)
- [10]
-
[11]
Rabiner, L. & Juang, B. An introduction to hidden Markov models. IEEE ASSP Magazine 3, 4– 16 (1986)
work page 1986
-
[12]
Yuan, H. et al. Self-supervised learning for human activity recognition using 700,000 person- days of wearable data. npj Digit. Med. 7, 1–10 (2024)
work page 2024
-
[13]
Haresamudram, H., Essa, I. & Plötz, T. Assessing the State of Self-Supervised Human Activity Recognition Using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 1–47 (2022)
work page 2022
-
[15]
From scale to speed: Adaptive test-time scaling for image editing.CoRR, abs/2603.00141, 2026
Sridhar, N. & Myers, L. Human Activity Recognition on wrist-worn accelerometers using self- supervised neural networks. (2021) doi:10.48550/ARXIV .2112.12272
work page internal anchor Pith review doi:10.48550/arxiv 2021
-
[16]
Identity Mappings in Deep Residual Networks
He, K., Zhang, X., Ren, S. & Sun, J. Identity Mappings in Deep Residual Networks. Preprint at https://doi.org/10.48550/arXiv.1603.05027 (2016)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1603.05027 2016
-
[17]
SMALL, S. R. et al. Self-Supervised Machine Learning to Characterize Step Counts from Wrist-Worn Accelerometers in the UK Biobank. Medicine & Science in Sports & Exercise 56, (2024)
work page 2024
-
[18]
Hoffman, B. et al. A benchmark for computational analysis of animal behavior, using animal-borne tags. Movement Ecology 12, 78 (2024)
work page 2024
-
[19]
Acquah, A., Chan, S. & Doherty, A. ActiNet: An activity classification model based on self- supervised learning for wrist-worn accelerometer data. https://doi.org/10.5281/zenodo.15365422 (2025)
-
[20]
Chang, S. C. et al. Capture-24: Activity tracker dataset for human activity recognition. (2021)
work page 2021
-
[21]
Chan, S. et al. CAPTURE-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition. Sci Data 11, 1–11 (2024)
work page 2024
-
[22]
Chan, S. & Doherty, A. actipy: Python toolkit to process wearable sensor data. https://doi.org/10.5281/zenodo.14514488 (2021)
-
[23]
On the theory of filter amplifiers
Butterworth, S. On the theory of filter amplifiers. Experimental Wireless and the Wireless Engineer 7, 536–541 (1930)
work page 1930
-
[24]
van Hees, V . T. et al. Autocalibration of accelerometer data for free-living physical activity assessment using local gravity and temperature: an evaluation on four continents. J Appl Physiol (1985) 117, 738–744 (2014)
work page 1985
-
[25]
Kingma, D. P . & Ba, J. Adam: A Method for Stochastic Optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2017)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980 2017
-
[26]
Forney, G. D. The viterbi algorithm. Proceedings of the IEEE 61, 268–278 (1973)
work page 1973
-
[27]
StratifiedGroupKFold. scikit-learn https://scikit- learn/stable/modules/generated/sklearn.model_selection.StratifiedGroupKFold.html
-
[28]
Sztyler, T. & Stuckenschmidt, H. On-body localization of wearable devices: An investigation of position-aware activity recognition. in 2016 IEEE International Conference on Pervasive Computing and Communications (PerCom) 1–9 (IEEE, Sydney, Australia, 2016). doi:10.1109/PERCOM.2016.7456521
-
[29]
Weiss, G. M., Yoneda, K. & Hayajneh, T. Smartphone and Smartwatch-Based Biometrics Using Activities of Daily Living. IEEE Access 7, 133190–133202 (2019)
work page 2019
-
[30]
Bland, J. M. & Altman, D. G. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1, 307–310 (1986)
work page 1986
-
[31]
Correlation and regression | The BMJ
11. Correlation and regression | The BMJ. The BMJ | The BMJ: leading general medical journal. Research. Education. Comment https://www.bmj.com/about-bmj/resources- readers/publications/statistics-square-one/11-correlation-and-regression (2020)
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.