pith. machine review for the scientific record. sign in

arxiv: 2605.08296 · v1 · submitted 2026-05-08 · 💻 cs.CV · eess.SP

Recognition: no theorem link

BenchHAR: Benchmarking Self-Supervised Learning for Generalizable Sensor-based Activity Recognition

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:23 UTC · model grok-4.3

classification 💻 cs.CV eess.SP
keywords human activity recognitionself-supervised learninggeneralizationwearable sensorsbenchmarkCNN encoderhybrid pretrainingsensor data
0
0 comments X

The pith

Self-supervised learning methods for sensor-based human activity recognition struggle to generalize to unseen distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

BenchHAR evaluates how well self-supervised learning transfers to new sensor data distributions in human activity recognition, where labeled data is scarce and devices vary. The work tests eight SSL methods across twelve encoder-classifier pairs on a curated dataset of roughly 258,000 samples to find which design choices actually improve performance on held-out users, devices, and activities. A reader would care because wearable HAR supports healthcare and behavior studies, yet most current models break when moved to real-world conditions with different hardware or body placements. The results identify concrete patterns, such as hybrid pretraining working best and certain data sources transferring better than others.

Core claim

Existing SSL methods struggle to achieve satisfactory generalization performance. The hybrid paradigm combining reconstruction and contrastive pretraining achieves the best overall performance. CNN encoders exhibit the strongest ability to learn generalizable representations, while more expressive classifier architectures further improve generalization. Increasing the amount of pretraining data from downstream activity classes consistently improves generalization, while adding more labeled data yields limited gains. Incorporating unlabeled data from non-downstream activity classes does not improve generalization. Sensor data collected from custom-grade devices generalizes better than that of

What carries the argument

The BenchHAR evaluation framework, which assembles a large multi-source dataset and runs controlled comparisons of SSL pretraining objectives and model architectures to measure accuracy on unseen target distributions.

If this is right

  • Hybrid reconstruction-plus-contrastive pretraining should be the default starting point for new HAR models seeking generalization.
  • CNN encoders should be preferred over more complex alternatives when the goal is representations that transfer to new users and devices.
  • Pretraining data collection should prioritize samples from the exact activity classes that will appear at test time rather than broad unlabeled pools.
  • Custom-grade wearables mounted on limbs supply more transferable signals than research-grade devices or trunk placements.
  • Scaling unlabeled pretraining data from target classes produces larger gains than collecting additional labeled examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Standard SSL recipes borrowed from images or text may need targeted redesign for the temporal and positional structure of wearable sensor streams.
  • The limited benefit from non-downstream unlabeled data points to the importance of class-specific or device-specific alignment during pretraining.
  • Practitioners could combine the benchmark's data-scale findings with lightweight domain adaptation to further close the generalization gap.
  • Extending the evaluation to include cross-device calibration or real-time streaming constraints would test whether the reported patterns survive deployment conditions.

Load-bearing premise

That the chosen dataset size and the specific twelve architectures plus eight SSL methods are representative enough of real-world sensor and model variation for the generalization patterns to hold broadly.

What would settle it

Running a previously untested SSL method or encoder through the same BenchHAR splits and held-out distributions and finding it produces markedly higher accuracy on the target sets than the top hybrid CNN results would challenge the claim that existing methods struggle.

Figures

Figures reproduced from arXiv: 2605.08296 by Anlan Yu, Baoshen Guo, Rui Feng, Yize Cai, Zhiqing Hong.

Figure 1
Figure 1. Figure 1: Benchmark overview. Unlike most existing cross-dataset studies that utilize target datasets [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Benchmark dataset statistics. (top) Sample size distribution of each dataset. (bottom left) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pretraining paradigms for sensor-based HAR. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of training data scale on generalization performance. Mean [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of the 14 sensor-based HAR datasets. [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Impact of training data scale on the generalization performance of BioBankSSL. Mean [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Impact of training data scale on the generalization performance of CRT. Mean [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Impact of training data scale on the generalization performance of FOCAL. Mean [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Impact of training data scale on the generalization performance of SimMTM. Mean [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
read the original abstract

Human Activity Recognition (HAR) from wearable sensors supports broad healthcare and behavior science applications. However, data heterogeneity and the scarcity of labeled data limit its real-world generalization. Recent advances in self-supervised learning (SSL) in vision and language domains have shown strong capability for learning generalizable representations from unlabeled data. Yet, few studies have systematically compared the generalization performance of SSL methods or explored how to adapt them for generalizable HAR. To address these gaps, we present BenchHAR, a unified framework for evaluating the generalization capability of SSL methods for sensor-based HAR on unseen target distributions. BenchHAR curates a large-scale dataset (~258K samples) and evaluates eight representative SSL methods across 12 encoder-classifier architectures. Our results reveal that existing SSL methods struggle to achieve satisfactory generalization performance. We find that: (1) For HAR models, the hybrid paradigm (combining reconstruction and contrastive pretraining) achieves the best overall performance. The CNN encoder exhibits the strongest ability to learn generalizable representations, while more expressive classifier architectures further improve generalization. (2) For data scale, increasing the amount of pretraining data from downstream activity classes consistently improves generalization, while adding more labeled data yields limited gains. Interestingly, incorporating unlabeled data from non-downstream activity classes does not improve generalization. (3) Sensor data collected from custom-grade devices generalizes better than that from research-grade devices, and data from limb transfers more effectively to trunk positions. BenchHAR provides a unified benchmark and actionable insights for generalizable sensor-based HAR systems. Our code is available at https://github.com/saiketa/HAR-Bench.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces BenchHAR, a unified benchmark framework for evaluating the generalization performance of self-supervised learning (SSL) methods in sensor-based human activity recognition (HAR) on unseen target distributions. It curates a large-scale dataset of ~258K samples and systematically evaluates eight representative SSL methods across twelve encoder-classifier architecture pairs. Key empirical findings include the superiority of hybrid reconstruction+contrastive pretraining, the strength of CNN encoders for generalizable representations, benefits from scaling pretraining data drawn from downstream activity classes, limited gains from additional labeled data or non-downstream unlabeled data, and better transfer from custom-grade devices and limb placements to trunk positions. The work provides open code and actionable insights for building generalizable HAR systems.

Significance. If the dataset curation and architecture choices prove representative of real-world sensor heterogeneity, the benchmark would offer a valuable, reproducible resource for the HAR community by quantifying the current limitations of SSL methods and identifying concrete design choices (hybrid SSL, CNN encoders, data scaling strategies) that improve cross-distribution performance. The open-sourcing of code strengthens the contribution by enabling direct follow-up work.

major comments (2)
  1. [Section 3] Section 3 (Dataset Curation and Experimental Setup): The manuscript provides no quantitative coverage metrics—such as per-class sample counts and imbalance ratios, sensor sampling frequency histograms, device-grade distributions, or direct statistical comparisons against external corpora (e.g., PAMAP2, OPPORTUNITY, or UK Biobank subsets)—to demonstrate that the ~258K-sample collection adequately samples the space of real-world activity classes, placements, and device heterogeneity. This omission is load-bearing for the central claims about generalization performance, hybrid SSL superiority, and data-scale effects, as the reported trends could be artifacts of the specific curation rather than broadly actionable.
  2. [Section 4] Section 4 (Results on Architecture and SSL Choices): The superiority of the hybrid paradigm and CNN encoders is asserted on the basis of 12 encoder-classifier pairs and 8 SSL methods, yet the paper does not include an ablation or coverage argument showing why these families sufficiently represent modern alternatives (e.g., transformer-based encoders or recent contrastive variants such as SimCLR-v2 or MoCo-v3). Without such justification or sensitivity analysis, the ranking of paradigms risks being benchmark-specific rather than general.
minor comments (2)
  1. [Section 3] The distinction between 'custom-grade' and 'research-grade' devices is introduced in the abstract and results but would benefit from an explicit definition or table in Section 3 listing the exact sensor models and their technical specifications.
  2. [Figures] Figure captions and axis labels in the result plots should explicitly state the evaluation metric (e.g., macro-F1 or accuracy) and whether error bars represent standard deviation across seeds or subjects.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed the major comments carefully and agree that additional quantitative details on dataset coverage and justification for architectural choices will strengthen the work. We address each point below and commit to revisions where appropriate.

read point-by-point responses
  1. Referee: [Section 3] Section 3 (Dataset Curation and Experimental Setup): The manuscript provides no quantitative coverage metrics—such as per-class sample counts and imbalance ratios, sensor sampling frequency histograms, device-grade distributions, or direct statistical comparisons against external corpora (e.g., PAMAP2, OPPORTUNITY, or UK Biobank subsets)—to demonstrate that the ~258K-sample collection adequately samples the space of real-world activity classes, placements, and device heterogeneity. This omission is load-bearing for the central claims about generalization performance, hybrid SSL superiority, and data-scale effects, as the reported trends could be artifacts of the specific curation rather than broadly actionable.

    Authors: We agree that quantitative coverage metrics are essential to support claims about the representativeness of the curated dataset and the generalizability of our findings. In the revised manuscript, we will add a new subsection in Section 3 with tables reporting per-class sample counts, imbalance ratios, sensor sampling frequency histograms, and device-grade distributions. We will also include direct statistical comparisons (e.g., activity class overlap, placement statistics) against publicly available corpora such as PAMAP2 and OPPORTUNITY. For UK Biobank, we will note access limitations but provide proxy comparisons using available subsets where possible. These additions will clarify that the observed trends are not artifacts of curation. revision: yes

  2. Referee: [Section 4] Section 4 (Results on Architecture and SSL Choices): The superiority of the hybrid paradigm and CNN encoders is asserted on the basis of 12 encoder-classifier pairs and 8 SSL methods, yet the paper does not include an ablation or coverage argument showing why these families sufficiently represent modern alternatives (e.g., transformer-based encoders or recent contrastive variants such as SimCLR-v2 or MoCo-v3). Without such justification or sensitivity analysis, the ranking of paradigms risks being benchmark-specific rather than general.

    Authors: We selected the 8 SSL methods and 12 encoder-classifier pairs to cover the most representative approaches from the recent HAR and SSL literature (e.g., reconstruction, contrastive, and hybrid families with CNN, RNN, and MLP variants). We acknowledge the value of explicit justification and sensitivity analysis. In the revision, we will expand the discussion in Section 4 to include a rationale subsection citing prevalence in prior work and add a limited sensitivity analysis on transformer encoders where feasible within computational constraints. We will also explicitly note the absence of full evaluations for SimCLR-v2 and MoCo-v3 as a limitation and suggest it as future work, while arguing that the current scope still yields actionable design insights for generalizable HAR. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmark reports experimental results without self-referential derivations or fitted predictions.

full rationale

The paper curates a dataset (~258K samples), evaluates 8 SSL methods across 12 architectures, and reports generalization performance metrics. No equations, parameter fits, or predictions are derived from prior outputs within the paper. Claims rest on direct experimental comparisons (e.g., hybrid SSL vs. others, CNN encoders, data-scale effects) that are externally verifiable via the released code and dataset. No self-citation chains or ansatzes are invoked as load-bearing justifications. This is a standard empirical benchmarking study with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard supervised and self-supervised training assumptions plus the representativeness of the chosen dataset and model zoo; no new physical or mathematical axioms are introduced.

axioms (1)
  • domain assumption Standard assumptions of i.i.d. sampling within each data distribution and that cross-distribution shifts are the primary source of generalization failure.
    Invoked implicitly when defining 'unseen target distributions' and measuring generalization.

pith-pipeline@v0.9.0 · 5604 in / 1423 out tokens · 38963 ms · 2026-05-12T01:23:26.167718+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 1 internal anchor

  1. [1]

    Human activity recognition using wearable sensors: review, challenges, evaluation benchmark

    Reem Abdel-Salam, Rana Mostafa, and Mayada Hadhood. Human activity recognition using wearable sensors: review, challenges, evaluation benchmark. InInternational workshop on deep learning for human activity recognition, pages 1–15. Springer, 2021

  2. [2]

    Nafees Ahmad and Ho-fung Leung. Hyperhar: Inter-sensing device bilateral correlations and hyper-correlations learning approach for wearable sensing device based human activity recogni- tion.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(1):1–29, 2024

  3. [3]

    Countrywide natural experiment links built environment to physical activity.Nature, 645(8080):407–413, 2025

    Tim Althoff, Boris Ivanovic, Abby C King, Jennifer L Hicks, Scott L Delp, and Jure Leskovec. Countrywide natural experiment links built environment to physical activity.Nature, 645(8080):407–413, 2025

  4. [4]

    mhealthdroid: a novel framework for agile development of mobile health applications

    Oresti Banos, Rafael Garcia, Juan A Holgado-Terriza, Miguel Damas, Hector Pomares, Ignacio Rojas, Alejandro Saez, and Claudia Villalonga. mhealthdroid: a novel framework for agile development of mobile health applications. InInternational workshop on ambient assisted living, pages 91–98. Springer, 2014

  5. [5]

    Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units.The Computer Journal, 57(11):1649–1667, 2014

    Billur Barshan and Murat Cihan Yüksek. Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units.The Computer Journal, 57(11):1649–1667, 2014

  6. [6]

    Wearable device-based health equivalence of different physical ac- tivity intensities against mortality, cardiometabolic disease, and cancer.Nature Communications, 16(1):8315, 2025

    Raaj Kishore Biswas, Matthew N Ahmadi, Adrian Bauman, Karen Milton, Nicholas A Koemel, and Emmanuel Stamatakis. Wearable device-based health equivalence of different physical ac- tivity intensities against mortality, cardiometabolic disease, and cancer.Nature Communications, 16(1):8315, 2025

  7. [7]

    Towards generalizable human activity recognition: A survey.arXiv preprint arXiv:2508.12213, 2025

    Yize Cai, Baoshen Guo, Flora Salim, and Zhiqing Hong. Towards generalizable human activity recognition: A survey.arXiv preprint arXiv:2508.12213, 2025

  8. [8]

    Capture-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition.Scientific Data, 11(1):1135, 2024

    Shing Chan, Yuan Hang, Catherine Tong, Aidan Acquah, Abram Schonfeldt, Jonathan Gershuny, and Aiden Doherty. Capture-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition.Scientific Data, 11(1):1135, 2024

  9. [9]

    Deep learning for sensor-based human activity recognition: Overview, challenges, and opportunities.ACM Computing Surveys (CSUR), 54(4):1–40, 2021

    Kaixuan Chen, Dalin Zhang, Lina Yao, Bin Guo, Zhiwen Yu, and Yunhao Liu. Deep learning for sensor-based human activity recognition: Overview, challenges, and opportunities.ACM Computing Surveys (CSUR), 54(4):1–40, 2021

  10. [10]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

  11. [11]

    A noise-tolerant human–machine interface based on deep learning-enhanced wearable sensors.Nature Sensors, 1(1):39–51, 2026

    Xiangjun Chen, Zhiyuan Lou, Xiaoxiang Gao, Lu Yin, Siyu Qin, Muyang Lin, Fangao Zhang, Yi Lu, Shichao Ding, Ruixiao Liu, et al. A noise-tolerant human–machine interface based on deep learning-enhanced wearable sensors.Nature Sensors, 1(1):39–51, 2026

  12. [12]

    Harsense: Statistical human activity recognition dataset, 2021

    Nurul Amin Choudhury, Soumen Moulik, and Diptendu Sinha Roy. Harsense: Statistical human activity recognition dataset, 2021

  13. [13]

    Gaole Dai, Huatao Xu, Hyungjun Yoon, Mo Li, Rui Tan, and Sung-Ju Lee. Contrastsense: Domain-invariant contrastive learning for in-the-wild wearable sensing.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(4):1–32, 2024

  14. [14]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  15. [15]

    Simmtm: A simple pre-training framework for masked time-series modeling.Advances in Neural Information Processing Systems, 36:29996–30025, 2023

    Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, and Mingsheng Long. Simmtm: A simple pre-training framework for masked time-series modeling.Advances in Neural Information Processing Systems, 36:29996–30025, 2023. 10

  16. [16]

    Comparing self-supervised learning techniques for wearable human activity recognition.CCF Transactions on Pervasive Computing and Interaction, 7(3):324–341, 2025

    Sannara Ek, Riccardo Presotto, Gabriele Civitarese, François Portet, Philippe Lalanda, and Claudio Bettini. Comparing self-supervised learning techniques for wearable human activity recognition.CCF Transactions on Pervasive Computing and Interaction, 7(3):324–341, 2025

  17. [17]

    Time-series representation learning via temporal and contextual contrasting

    Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. Time-series representation learning via temporal and contextual contrasting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 2352–2359, 2021

  18. [18]

    Wearable technologies for assisted mobility in the real world.Nature Communications, 2025

    Shuo Gao, Jianan Chen, Yunjia Xia, Xuemeng Li, Weihao Ma, Huixin Yang, Jinchen Li, Xinkai Zhou, Tianyu Jia, Yuchen Xu, et al. Wearable technologies for assisted mobility in the real world.Nature Communications, 2025

  19. [19]

    Scaling laws for neural machine translation

    Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, and Colin Cherry. Scaling laws for neural machine translation. InInternational Conference on Learning Representations, 2022

  20. [20]

    Imagebind: One embedding space to bind them all

    Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15180–15190, 2023

  21. [21]

    Virtual reality interactions via a user-generic ultrasound human-machine interface for wrist and hand tracking.Nature Commu- nications, 16(1):11062, 2025

    Bruno Grandi Sgambato, Bálint K Hodossy, Deren Yusuf Barsakcioglu, Xingchen Yang, Anette Jakob, Marc Fournelle, Meng-Xing Tang, and Dario Farina. Virtual reality interactions via a user-generic ultrasound human-machine interface for wrist and hand tracking.Nature Commu- nications, 16(1):11062, 2025

  22. [22]

    Harish Haresamudram, Irfan Essa, and Thomas Plötz. Assessing the state of self-supervised human activity recognition using wearables.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(3):1–47, 2022

  23. [23]

    Harish Haresamudram, Chi Ian Tang, Sungho Suh, Paul Lukowicz, and Thomas Ploetz. Past, present, and future of sensor-based human activity recognition using wearables: A surveying tutorial on a still challenging task.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(2):1–44, 2025

  24. [24]

    An empirical analysis of compute-optimal large language model training

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katherine Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Oriol Vinyals, Jack William Rae, and Laur...

  25. [25]

    Zhiqing Hong, Zelong Li, Shuxin Zhong, Wenjun Lyu, Haotian Wang, Yi Ding, Tian He, and Desheng Zhang. Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(2):1–26, 2024

  26. [26]

    Llm4har: Generalizable on-device human activity recognition with pretrained llms

    Zhiqing Hong, Yiwei Song, Zelong Li, Anlan Yu, Shuxin Zhong, Yi Ding, Tian He, and Desheng Zhang. Llm4har: Generalizable on-device human activity recognition with pretrained llms. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 4511–4521, 2025

  27. [27]

    Experience paper: Nationwide human behavior sensing in last-mile delivery

    Zhiqing Hong, Weibing Wang, Anlan Yu, Shuxin Zhong, Haotian Wang, Yi Ding, Tian He, and Desheng Zhang. Experience paper: Nationwide human behavior sensing in last-mile delivery. In Proceedings of the 31st Annual International Conference on Mobile Computing and Networking, pages 682–696, 2025

  28. [28]

    Bench- marking classical, deep, and generative models for human activity recognition.arXiv preprint arXiv:2501.08471, 2025

    Md Meem Hossain, The Anh Han, Safina Showkat Ara, and Zia Ush Shamszaman. Bench- marking classical, deep, and generative models for human activity recognition.arXiv preprint arXiv:2501.08471, 2025. 11

  29. [29]

    Swl-adapt: An unsupervised domain adaptation model with sample weight learning for cross-user wearable human activity recog- nition

    Rong Hu, Ling Chen, Shenghuan Miao, and Xing Tang. Swl-adapt: An unsupervised domain adaptation model with sample weight learning for cross-user wearable human activity recog- nition. InProceedings of the AAAI Conference on artificial intelligence, volume 37, pages 6012–6020, 2023

  30. [30]

    Refuseact: Representation fusion using self-supervised learning for activity recognition in next generation networks.Information Fusion, 102:102044, 2024

    Sunder Ali Khowaja, Parus Khuwaja, Fayaz Ali Dharejo, Saleem Raza, Ik Hyun Lee, Rizwan Ali Naqvi, and Kapal Dev. Refuseact: Representation fusion using self-supervised learning for activity recognition in next generation networks.Information Fusion, 102:102044, 2024

  31. [31]

    Wearable accelerometer-derived physical activity and incident disease.NPJ Digital Medicine, 5(1):131, 2022

    Shaan Khurshid, Lu-Chen Weng, Victor Nauffal, James P Pirruccello, Rachael A Venn, Mostafa A Al-Alusi, Emelia J Benjamin, Patrick T Ellinor, and Steven A Lubitz. Wearable accelerometer-derived physical activity and incident disease.NPJ Digital Medicine, 5(1):131, 2022

  32. [32]

    Reversible instance normalization for accurate time-series forecasting against distribution shift

    Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. InInternational conference on learning representations, 2021

  33. [33]

    Soft contrastive learning for time series

    Seunghan Lee, Taeyoung Park, and Kibok Lee. Soft contrastive learning for time series. InThe Twelfth International Conference on Learning Representations, 2024

  34. [34]

    Deep transfer learning with graph neural network for sensor-based human activity recognition

    Tianzheng Liao, Jinjin Zhao, Yushi Liu, Kamen Ivanov, Jing Xiong, and Yan Yan. Deep transfer learning with graph neural network for sensor-based human activity recognition. In2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2445–2452. IEEE, 2022

  35. [35]

    Shengzhong Liu, Tomoyoshi Kimura, Dongxin Liu, Ruijie Wang, Jinyang Li, Suhas Diggavi, Mani Srivastava, and Tarek Abdelzaher. Focal: Contrastive learning for multimodal time- series sensing signals in factorized orthogonal latent space.Advances in Neural Information Processing Systems, 36:47309–47338, 2023

  36. [36]

    Aleksej Logacjov. Self-supervised learning for accelerometer-based human activity recogni- tion: A survey.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(4):1–42, 2024

  37. [37]

    Wang Lu, Jindong Wang, Yiqiang Chen, Sinno Jialin Pan, Chunyu Hu, and Xin Qin. Semantic- discriminative mixup for generalizable sensor-based cross-domain activity recognition.Pro- ceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(2):1–19, 2022

  38. [38]

    Diversify: A general framework for time series out-of-distribution detection and generalization

    Wang Lu, Jindong Wang, Xinwei Sun, Yiqiang Chen, Xiangyang Ji, Qiang Yang, and Xing Xie. Diversify: A general framework for time series out-of-distribution detection and generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(6):4534–4550, 2024

  39. [39]

    Harood: A benchmark for out-of-distribution general- ization in sensor-based human activity recognition

    Wang Lu, Yao Zhu, and Jindong Wang. Harood: A benchmark for out-of-distribution general- ization in sensor-based human activity recognition. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 2746–2757, 2026

  40. [40]

    Mobile sensor data anonymization

    Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and Hamed Haddadi. Mobile sensor data anonymization. InProceedings of the international conference on internet of things design and implementation, pages 49–58, 2019

  41. [41]

    Overview on wearable sensors for the management of parkinson’s disease.npj Parkinson’s Disease, 9(1):153, 2023

    Caroline Moreau, Tiphaine Rouaud, David Grabli, Isabelle Benatru, Philippe Remy, Ana- Raquel Marques, Sophie Drapier, Louise-Laure Mariani, Emmanuel Roze, David Devos, et al. Overview on wearable sensors for the management of parkinson’s disease.npj Parkinson’s Disease, 9(1):153, 2023

  42. [42]

    A benchmark for domain adaptation and generalization in smartphone-based human activity recognition.Scientific Data, 11(1):1192, 2024

    Otávio Napoli, Dami Duarte, Patrick Alves, Darlinne Hubert Palo Soto, Henrique Evangelista de Oliveira, Anderson Rocha, Levy Boccato, and Edson Borin. A benchmark for domain adaptation and generalization in smartphone-based human activity recognition.Scientific Data, 11(1):1192, 2024. 12

  43. [43]

    Tailor, Jacob Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff

    Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, shun liao, Jake Garrison, Shyam A. Tailor, Jacob Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff. Scaling wearable foundation models. InThe Thirteenth International Conference on Learning R...

  44. [44]

    Representation Learning with Contrastive Predictive Coding

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

  45. [45]

    Calanet: Cheap all-layer aggregation for human activity recognition.Advances in Neural Information Processing Systems, 37:69419–69444, 2024

    Jaegyun Park, Dae-Won Kim, and Jaesung Lee. Calanet: Cheap all-layer aggregation for human activity recognition.Advances in Neural Information Processing Systems, 37:69419–69444, 2024

  46. [46]

    Hangwei Qian, Tian Tian, and Chunyan Miao. What makes good contrastive learning on small-scale wearable-based tasks? InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 3761–3771, 2022

  47. [47]

    Generalizable low-resource activity recognition with diverse and discriminative representation learning

    Xin Qin, Jindong Wang, Shuo Ma, Wang Lu, Yongchun Zhu, Xing Xie, and Yiqiang Chen. Generalizable low-resource activity recognition with diverse and discriminative representation learning. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1943–1953, 2023

  48. [48]

    Introducing a new benchmarked dataset for activity monitoring

    Attila Reiss and Didier Stricker. Introducing a new benchmarked dataset for activity monitoring. In2012 16th international symposium on wearable computers, pages 108–109. IEEE, 2012

  49. [49]

    Transition- aware human activity recognition using smartphones.Neurocomputing, 171:754–767, 2016

    Jorge-L Reyes-Ortiz, Luca Oneto, Albert Samà, Xavier Parra, and Davide Anguita. Transition- aware human activity recognition using smartphones.Neurocomputing, 171:754–767, 2016

  50. [50]

    Multi-task self-supervised learning for human activity detection.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(2):1–30, 2019

    Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. Multi-task self-supervised learning for human activity detection.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(2):1–30, 2019

  51. [51]

    Scaling law for time series forecasting.Advances in Neural Information Processing Systems, 37:83314–83344, 2024

    Jingzhe Shi, Qinwei Ma, Huan Ma, and Lei Li. Scaling law for time series forecasting.Advances in Neural Information Processing Systems, 37:83314–83344, 2024

  52. [52]

    Fusion of smartphone motion sensors for physical activity recognition.Sensors, 14(6):10146– 10176, 2014

    Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga. Fusion of smartphone motion sensors for physical activity recognition.Sensors, 14(6):10146– 10176, 2014

  53. [53]

    Complex human activity recognition using smartphone and wrist-worn motion sensors.Sensors, 16(4):426, 2016

    Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga. Complex human activity recognition using smartphone and wrist-worn motion sensors.Sensors, 16(4):426, 2016

  54. [54]

    Ku-har: An open dataset for heterogeneous human activity recognition.Pattern Recognition Letters, 146:46–54, 2021

    Niloy Sikder and Abdullah-Al Nahid. Ku-har: An open dataset for heterogeneous human activity recognition.Pattern Recognition Letters, 146:46–54, 2021

  55. [55]

    Feel: Quantifying heterogeneity in physiological signals for generalizable emotion recognition

    Pragya Singh, Ankush Gupta, Somay Jalan, Mohan Kumar, and Pushpendra Singh. Feel: Quantifying heterogeneity in physiological signals for generalizable emotion recognition. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025

  56. [56]

    Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition

    Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjær- gaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. InProceedings of the 13th ACM conference on embedded networked sensor systems, pages 127–140, 2015

  57. [57]

    A systematic review of smartphone-based human activity recognition methods for health research.NPJ Digital Medicine, 4(1):148, 2021

    Marcin Straczkiewicz, Peter James, and Jukka-Pekka Onnela. A systematic review of smartphone-based human activity recognition methods for health research.NPJ Digital Medicine, 4(1):148, 2021

  58. [58]

    On-body localization of wearable devices: An investigation of position-aware activity recognition

    Timo Sztyler and Heiner Stuckenschmidt. On-body localization of wearable devices: An investigation of position-aware activity recognition. In2016 IEEE international conference on pervasive computing and communications (PerCom), pages 1–9. IEEE, 2016. 13

  59. [59]

    Validation of an activity type recognition model classifying daily physical behavior in older adults: the har70+ model.Sensors, 23(5):2368, 2023

    Astrid Ustad, Aleksej Logacjov, Stine Øverengen Trollebø, Pernille Thingstad, Beatrix Verei- jken, Kerstin Bach, and Nina Skjæret Maroni. Validation of an activity type recognition model classifying daily physical behavior in older adults: the har70+ model.Sensors, 23(5):2368, 2023

  60. [60]

    Optimization-free test-time adaptation for cross-person activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(4):1–27, 2024

    Shuoyuan Wang, Jindong Wang, Huajun Xi, Bob Zhang, Lei Zhang, and Hongxin Wei. Optimization-free test-time adaptation for cross-person activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(4):1–27, 2024

  61. [61]

    WISDM Smartphone and Smartwatch Activity and Biometrics Dataset

    Gary Weiss. WISDM Smartphone and Smartwatch Activity and Biometrics Dataset . UCI Machine Learning Repository, 2019. DOI: https://doi.org/10.24432/C5HK59

  62. [62]

    Generalizable sensor- based activity recognition via categorical concept invariant learning

    Di Xiong, Shuoyuan Wang, Lei Zhang, Wenbo Huang, and Chaolei Han. Generalizable sensor- based activity recognition via categorical concept invariant learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 923–931, 2025

  63. [63]

    Experience paper: Adopting activity recognition in on-demand food delivery business

    Huatao Xu, Yan Zhang, Wei Gao, Guobin Shen, and Mo Li. Experience paper: Adopting activity recognition in on-demand food delivery business. InProceedings of the 31st Annual International Conference on Mobile Computing and Networking, pages 1015–1028, 2025

  64. [64]

    Practically adopting human activity recogni- tion

    Huatao Xu, Pengfei Zhou, Rui Tan, and Mo Li. Practically adopting human activity recogni- tion. InProceedings of the 29th Annual International Conference on Mobile Computing and Networking, pages 1–15, 2023

  65. [65]

    Limu-bert: Unleashing the potential of unlabeled data for imu sensing applications

    Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. Limu-bert: Unleashing the potential of unlabeled data for imu sensing applications. InProceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, pages 220–233, 2021

  66. [66]

    Relcon: Relative contrastive learning for a motion foundation model for wearable data

    Maxwell A Xu, Jaya Narain, Gregory Darnell, Haraldur T Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Andres Fineman, Karthik Jayaraman Raghuram, James Matthew Rehg, and Shirley You Ren. Relcon: Relative contrastive learning for a motion foundation model for wearable data. InThe Thirteenth International Conference on Learning Representations, 2025

  67. [67]

    Mobhar: source-free knowledge transfer for human activity recognition on mobile devices

    Meng Xue, Yinan Zhu, Wentao Xie, Zhixian Wang, Yanjiao Chen, Kui Jiang, and Qian Zhang. Mobhar: source-free knowledge transfer for human activity recognition on mobile devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(1):1– 24, 2025

  68. [68]

    Takahiro Yamane, Moeka Kimura, and Mizuki Morita. Impact of sensor-axis combinations on machine learning accuracy for human activity recognition using accelerometer data in clinical settings.Physical Activity and Health, 9(1), 2025

  69. [69]

    Large language model-guided semantic alignment for human activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(4):1–25, 2025

    Hua Yan, Heng Tan, Yi Ding, Pengfei Zhou, Vinod Namboodiri, and Yu Yang. Large language model-guided semantic alignment for human activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(4):1–25, 2025

  70. [70]

    Self-supervised learning for human activity recognition using 700,000 person-days of wearable data.NPJ digital medicine, 7(1):91, 2024

    Hang Yuan, Shing Chan, Andrew P Creagh, Catherine Tong, Aidan Acquah, David A Clifton, and Aiden Doherty. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data.NPJ digital medicine, 7(1):91, 2024

  71. [71]

    Ts2vec: Towards universal representation of time series

    Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series. InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 8980–8987, 2022

  72. [72]

    MoPFormer: Motion- primitive transformer for wearable-sensor activity recognition

    Hao Zhang, Zhan Zhuang, Xuehao Wang, Xiaodong Yang, and Yu Zhang. MoPFormer: Motion- primitive transformer for wearable-sensor activity recognition. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  73. [73]

    Usc-had: A daily activity dataset for ubiquitous activity recognition using wearable sensors

    Mi Zhang and Alexander A Sawchuk. Usc-had: A daily activity dataset for ubiquitous activity recognition using wearable sensors. InProceedings of the 2012 ACM conference on ubiquitous computing, pages 1036–1043, 2012. 14

  74. [74]

    Self-supervised time series repre- sentation learning via cross reconstruction transformer.IEEE Transactions on Neural Networks and Learning Systems, 35(11):16129–16138, 2023

    Wenrui Zhang, Ling Yang, Shijia Geng, and Shenda Hong. Self-supervised time series repre- sentation learning via cross reconstruction transformer.IEEE Transactions on Neural Networks and Learning Systems, 35(11):16129–16138, 2023

  75. [75]

    Unimts: Unified pre-training for motion time series.Advances in Neural Information Processing Systems, 37:107469–107493, 2024

    Xiyuan Zhang, Diyan Teng, Ranak R Chowdhury, Shuheng Li, Dezhi Hong, Rajesh K Gupta, and Jingbo Shang. Unimts: Unified pre-training for motion time series.Advances in Neural Information Processing Systems, 37:107469–107493, 2024

  76. [76]

    Towards open respiratory acoustic foundation models: Pretraining and benchmarking.Advances in Neural Information Processing Systems, 37:27024–27055, 2024

    Yuwei Zhang, Tong Xia, Jing Han, Yu Y Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, and Cecilia Mascolo. Towards open respiratory acoustic foundation models: Pretraining and benchmarking.Advances in Neural Information Processing Systems, 37:27024–27055, 2024

  77. [77]

    One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023

    Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al. One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023. 15 A Benchmark Dataset A.1 Datasets Overview We detail the information for the 14 datasets employed in our benchmark. As we focus on both accelerometer and gyroscope ...