arxiv: 2605.08296 · v1 · submitted 2026-05-08 · 💻 cs.CV · eess.SP

Recognition: no theorem link

BenchHAR: Benchmarking Self-Supervised Learning for Generalizable Sensor-based Activity Recognition

Yize Cai , Rui Feng , Anlan Yu , Baoshen Guo , Zhiqing Hong

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:23 UTC · model grok-4.3

classification 💻 cs.CV eess.SP

keywords human activity recognitionself-supervised learninggeneralizationwearable sensorsbenchmarkCNN encoderhybrid pretrainingsensor data

0 comments

The pith

Self-supervised learning methods for sensor-based human activity recognition struggle to generalize to unseen distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

BenchHAR evaluates how well self-supervised learning transfers to new sensor data distributions in human activity recognition, where labeled data is scarce and devices vary. The work tests eight SSL methods across twelve encoder-classifier pairs on a curated dataset of roughly 258,000 samples to find which design choices actually improve performance on held-out users, devices, and activities. A reader would care because wearable HAR supports healthcare and behavior studies, yet most current models break when moved to real-world conditions with different hardware or body placements. The results identify concrete patterns, such as hybrid pretraining working best and certain data sources transferring better than others.

Core claim

Existing SSL methods struggle to achieve satisfactory generalization performance. The hybrid paradigm combining reconstruction and contrastive pretraining achieves the best overall performance. CNN encoders exhibit the strongest ability to learn generalizable representations, while more expressive classifier architectures further improve generalization. Increasing the amount of pretraining data from downstream activity classes consistently improves generalization, while adding more labeled data yields limited gains. Incorporating unlabeled data from non-downstream activity classes does not improve generalization. Sensor data collected from custom-grade devices generalizes better than that of

What carries the argument

The BenchHAR evaluation framework, which assembles a large multi-source dataset and runs controlled comparisons of SSL pretraining objectives and model architectures to measure accuracy on unseen target distributions.

If this is right

Hybrid reconstruction-plus-contrastive pretraining should be the default starting point for new HAR models seeking generalization.
CNN encoders should be preferred over more complex alternatives when the goal is representations that transfer to new users and devices.
Pretraining data collection should prioritize samples from the exact activity classes that will appear at test time rather than broad unlabeled pools.
Custom-grade wearables mounted on limbs supply more transferable signals than research-grade devices or trunk placements.
Scaling unlabeled pretraining data from target classes produces larger gains than collecting additional labeled examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standard SSL recipes borrowed from images or text may need targeted redesign for the temporal and positional structure of wearable sensor streams.
The limited benefit from non-downstream unlabeled data points to the importance of class-specific or device-specific alignment during pretraining.
Practitioners could combine the benchmark's data-scale findings with lightweight domain adaptation to further close the generalization gap.
Extending the evaluation to include cross-device calibration or real-time streaming constraints would test whether the reported patterns survive deployment conditions.

Load-bearing premise

That the chosen dataset size and the specific twelve architectures plus eight SSL methods are representative enough of real-world sensor and model variation for the generalization patterns to hold broadly.

What would settle it

Running a previously untested SSL method or encoder through the same BenchHAR splits and held-out distributions and finding it produces markedly higher accuracy on the target sets than the top hybrid CNN results would challenge the claim that existing methods struggle.

Figures

Figures reproduced from arXiv: 2605.08296 by Anlan Yu, Baoshen Guo, Rui Feng, Yize Cai, Zhiqing Hong.

**Figure 2.** Figure 2: Benchmark dataset statistics. (top) Sample size distribution of each dataset. (bottom left) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Pretraining paradigms for sensor-based HAR. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of training data scale on generalization performance. Mean [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of the 14 sensor-based HAR datasets. [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Impact of training data scale on the generalization performance of BioBankSSL. Mean [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Impact of training data scale on the generalization performance of CRT. Mean [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Impact of training data scale on the generalization performance of FOCAL. Mean [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

**Figure 9.** Figure 9: Impact of training data scale on the generalization performance of SimMTM. Mean [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

read the original abstract

Human Activity Recognition (HAR) from wearable sensors supports broad healthcare and behavior science applications. However, data heterogeneity and the scarcity of labeled data limit its real-world generalization. Recent advances in self-supervised learning (SSL) in vision and language domains have shown strong capability for learning generalizable representations from unlabeled data. Yet, few studies have systematically compared the generalization performance of SSL methods or explored how to adapt them for generalizable HAR. To address these gaps, we present BenchHAR, a unified framework for evaluating the generalization capability of SSL methods for sensor-based HAR on unseen target distributions. BenchHAR curates a large-scale dataset (~258K samples) and evaluates eight representative SSL methods across 12 encoder-classifier architectures. Our results reveal that existing SSL methods struggle to achieve satisfactory generalization performance. We find that: (1) For HAR models, the hybrid paradigm (combining reconstruction and contrastive pretraining) achieves the best overall performance. The CNN encoder exhibits the strongest ability to learn generalizable representations, while more expressive classifier architectures further improve generalization. (2) For data scale, increasing the amount of pretraining data from downstream activity classes consistently improves generalization, while adding more labeled data yields limited gains. Interestingly, incorporating unlabeled data from non-downstream activity classes does not improve generalization. (3) Sensor data collected from custom-grade devices generalizes better than that from research-grade devices, and data from limb transfers more effectively to trunk positions. BenchHAR provides a unified benchmark and actionable insights for generalizable sensor-based HAR systems. Our code is available at https://github.com/saiketa/HAR-Bench.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BenchHAR delivers a large-scale empirical comparison of SSL methods for HAR generalization, but the specific rankings and advice rest on how representative their dataset and model choices turn out to be.

read the letter

BenchHAR is a benchmark paper that systematically compares self-supervised learning methods for making wearable sensor activity recognition generalize to new distributions. The main things to know are that hybrid SSL approaches perform best in their tests and that their findings on data scaling and device types give some practical pointers, though these depend on the dataset they built. The new part is the BenchHAR framework itself. They assembled a large collection of roughly 258,000 samples and evaluated eight representative SSL methods across twelve different encoder and classifier architectures. This setup lets them measure generalization on unseen target distributions, which is the core challenge in real-world HAR. The results show that combining reconstruction and contrastive pretraining works better overall than pure contrastive or other paradigms. CNN encoders come out stronger for learning transferable features, and more expressive classifiers help too. On the data side, more pretraining data from the downstream activities improves things, but extra labeled data gives only limited gains and unlabeled data from unrelated activities adds nothing. They also report that custom-grade devices generalize better than research-grade ones and that limb sensor data transfers to trunk positions more effectively. This is solid empirical work because it addresses a clear gap: few prior studies have done head-to-head comparisons of SSL for HAR generalization at this scale. Releasing the code is a real plus for anyone who wants to extend or verify the results. The main soft spot is whether the chosen activities, sensor placements, device types, and model families are broad enough to support the actionable insights. Without reported coverage metrics or comparisons to other corpora, the patterns could be specific to this curation rather than reflecting wider conditions. The abstract does not detail how they handled hyperparameter tuning or statistical significance, so those aspects would need checking in the full text. This paper is for researchers and practitioners in sensor-based HAR, especially those working on healthcare applications where data heterogeneity is a problem. It supplies a reference point for choosing SSL strategies. It deserves a serious referee because the scale of the evaluation and the relevance of the questions make it worth the time to verify the setup and results. I would recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces BenchHAR, a unified benchmark framework for evaluating the generalization performance of self-supervised learning (SSL) methods in sensor-based human activity recognition (HAR) on unseen target distributions. It curates a large-scale dataset of ~258K samples and systematically evaluates eight representative SSL methods across twelve encoder-classifier architecture pairs. Key empirical findings include the superiority of hybrid reconstruction+contrastive pretraining, the strength of CNN encoders for generalizable representations, benefits from scaling pretraining data drawn from downstream activity classes, limited gains from additional labeled data or non-downstream unlabeled data, and better transfer from custom-grade devices and limb placements to trunk positions. The work provides open code and actionable insights for building generalizable HAR systems.

Significance. If the dataset curation and architecture choices prove representative of real-world sensor heterogeneity, the benchmark would offer a valuable, reproducible resource for the HAR community by quantifying the current limitations of SSL methods and identifying concrete design choices (hybrid SSL, CNN encoders, data scaling strategies) that improve cross-distribution performance. The open-sourcing of code strengthens the contribution by enabling direct follow-up work.

major comments (2)

[Section 3] Section 3 (Dataset Curation and Experimental Setup): The manuscript provides no quantitative coverage metrics—such as per-class sample counts and imbalance ratios, sensor sampling frequency histograms, device-grade distributions, or direct statistical comparisons against external corpora (e.g., PAMAP2, OPPORTUNITY, or UK Biobank subsets)—to demonstrate that the ~258K-sample collection adequately samples the space of real-world activity classes, placements, and device heterogeneity. This omission is load-bearing for the central claims about generalization performance, hybrid SSL superiority, and data-scale effects, as the reported trends could be artifacts of the specific curation rather than broadly actionable.
[Section 4] Section 4 (Results on Architecture and SSL Choices): The superiority of the hybrid paradigm and CNN encoders is asserted on the basis of 12 encoder-classifier pairs and 8 SSL methods, yet the paper does not include an ablation or coverage argument showing why these families sufficiently represent modern alternatives (e.g., transformer-based encoders or recent contrastive variants such as SimCLR-v2 or MoCo-v3). Without such justification or sensitivity analysis, the ranking of paradigms risks being benchmark-specific rather than general.

minor comments (2)

[Section 3] The distinction between 'custom-grade' and 'research-grade' devices is introduced in the abstract and results but would benefit from an explicit definition or table in Section 3 listing the exact sensor models and their technical specifications.
[Figures] Figure captions and axis labels in the result plots should explicitly state the evaluation metric (e.g., macro-F1 or accuracy) and whether error bars represent standard deviation across seeds or subjects.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed the major comments carefully and agree that additional quantitative details on dataset coverage and justification for architectural choices will strengthen the work. We address each point below and commit to revisions where appropriate.

read point-by-point responses

Referee: [Section 3] Section 3 (Dataset Curation and Experimental Setup): The manuscript provides no quantitative coverage metrics—such as per-class sample counts and imbalance ratios, sensor sampling frequency histograms, device-grade distributions, or direct statistical comparisons against external corpora (e.g., PAMAP2, OPPORTUNITY, or UK Biobank subsets)—to demonstrate that the ~258K-sample collection adequately samples the space of real-world activity classes, placements, and device heterogeneity. This omission is load-bearing for the central claims about generalization performance, hybrid SSL superiority, and data-scale effects, as the reported trends could be artifacts of the specific curation rather than broadly actionable.

Authors: We agree that quantitative coverage metrics are essential to support claims about the representativeness of the curated dataset and the generalizability of our findings. In the revised manuscript, we will add a new subsection in Section 3 with tables reporting per-class sample counts, imbalance ratios, sensor sampling frequency histograms, and device-grade distributions. We will also include direct statistical comparisons (e.g., activity class overlap, placement statistics) against publicly available corpora such as PAMAP2 and OPPORTUNITY. For UK Biobank, we will note access limitations but provide proxy comparisons using available subsets where possible. These additions will clarify that the observed trends are not artifacts of curation. revision: yes
Referee: [Section 4] Section 4 (Results on Architecture and SSL Choices): The superiority of the hybrid paradigm and CNN encoders is asserted on the basis of 12 encoder-classifier pairs and 8 SSL methods, yet the paper does not include an ablation or coverage argument showing why these families sufficiently represent modern alternatives (e.g., transformer-based encoders or recent contrastive variants such as SimCLR-v2 or MoCo-v3). Without such justification or sensitivity analysis, the ranking of paradigms risks being benchmark-specific rather than general.

Authors: We selected the 8 SSL methods and 12 encoder-classifier pairs to cover the most representative approaches from the recent HAR and SSL literature (e.g., reconstruction, contrastive, and hybrid families with CNN, RNN, and MLP variants). We acknowledge the value of explicit justification and sensitivity analysis. In the revision, we will expand the discussion in Section 4 to include a rationale subsection citing prevalence in prior work and add a limited sensitivity analysis on transformer encoders where feasible within computational constraints. We will also explicitly note the absence of full evaluations for SimCLR-v2 and MoCo-v3 as a limitation and suggest it as future work, while arguing that the current scope still yields actionable design insights for generalizable HAR. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmark reports experimental results without self-referential derivations or fitted predictions.

full rationale

The paper curates a dataset (~258K samples), evaluates 8 SSL methods across 12 architectures, and reports generalization performance metrics. No equations, parameter fits, or predictions are derived from prior outputs within the paper. Claims rest on direct experimental comparisons (e.g., hybrid SSL vs. others, CNN encoders, data-scale effects) that are externally verifiable via the released code and dataset. No self-citation chains or ansatzes are invoked as load-bearing justifications. This is a standard empirical benchmarking study with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard supervised and self-supervised training assumptions plus the representativeness of the chosen dataset and model zoo; no new physical or mathematical axioms are introduced.

axioms (1)

domain assumption Standard assumptions of i.i.d. sampling within each data distribution and that cross-distribution shifts are the primary source of generalization failure.
Invoked implicitly when defining 'unseen target distributions' and measuring generalization.

pith-pipeline@v0.9.0 · 5604 in / 1423 out tokens · 38963 ms · 2026-05-12T01:23:26.167718+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · 1 internal anchor

[1]

Human activity recognition using wearable sensors: review, challenges, evaluation benchmark

Reem Abdel-Salam, Rana Mostafa, and Mayada Hadhood. Human activity recognition using wearable sensors: review, challenges, evaluation benchmark. InInternational workshop on deep learning for human activity recognition, pages 1–15. Springer, 2021

work page 2021
[2]

Nafees Ahmad and Ho-fung Leung. Hyperhar: Inter-sensing device bilateral correlations and hyper-correlations learning approach for wearable sensing device based human activity recogni- tion.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(1):1–29, 2024

work page 2024
[3]

Countrywide natural experiment links built environment to physical activity.Nature, 645(8080):407–413, 2025

Tim Althoff, Boris Ivanovic, Abby C King, Jennifer L Hicks, Scott L Delp, and Jure Leskovec. Countrywide natural experiment links built environment to physical activity.Nature, 645(8080):407–413, 2025

work page 2025
[4]

mhealthdroid: a novel framework for agile development of mobile health applications

Oresti Banos, Rafael Garcia, Juan A Holgado-Terriza, Miguel Damas, Hector Pomares, Ignacio Rojas, Alejandro Saez, and Claudia Villalonga. mhealthdroid: a novel framework for agile development of mobile health applications. InInternational workshop on ambient assisted living, pages 91–98. Springer, 2014

work page 2014
[5]

Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units.The Computer Journal, 57(11):1649–1667, 2014

Billur Barshan and Murat Cihan Yüksek. Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units.The Computer Journal, 57(11):1649–1667, 2014

work page 2014
[6]

Wearable device-based health equivalence of different physical ac- tivity intensities against mortality, cardiometabolic disease, and cancer.Nature Communications, 16(1):8315, 2025

Raaj Kishore Biswas, Matthew N Ahmadi, Adrian Bauman, Karen Milton, Nicholas A Koemel, and Emmanuel Stamatakis. Wearable device-based health equivalence of different physical ac- tivity intensities against mortality, cardiometabolic disease, and cancer.Nature Communications, 16(1):8315, 2025

work page 2025
[7]

Towards generalizable human activity recognition: A survey.arXiv preprint arXiv:2508.12213, 2025

Yize Cai, Baoshen Guo, Flora Salim, and Zhiqing Hong. Towards generalizable human activity recognition: A survey.arXiv preprint arXiv:2508.12213, 2025

work page arXiv 2025
[8]

Capture-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition.Scientific Data, 11(1):1135, 2024

Shing Chan, Yuan Hang, Catherine Tong, Aidan Acquah, Abram Schonfeldt, Jonathan Gershuny, and Aiden Doherty. Capture-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition.Scientific Data, 11(1):1135, 2024

work page 2024
[9]

Deep learning for sensor-based human activity recognition: Overview, challenges, and opportunities.ACM Computing Surveys (CSUR), 54(4):1–40, 2021

Kaixuan Chen, Dalin Zhang, Lina Yao, Bin Guo, Zhiwen Yu, and Yunhao Liu. Deep learning for sensor-based human activity recognition: Overview, challenges, and opportunities.ACM Computing Surveys (CSUR), 54(4):1–40, 2021

work page 2021
[10]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

work page 2020
[11]

A noise-tolerant human–machine interface based on deep learning-enhanced wearable sensors.Nature Sensors, 1(1):39–51, 2026

Xiangjun Chen, Zhiyuan Lou, Xiaoxiang Gao, Lu Yin, Siyu Qin, Muyang Lin, Fangao Zhang, Yi Lu, Shichao Ding, Ruixiao Liu, et al. A noise-tolerant human–machine interface based on deep learning-enhanced wearable sensors.Nature Sensors, 1(1):39–51, 2026

work page 2026
[12]

Harsense: Statistical human activity recognition dataset, 2021

Nurul Amin Choudhury, Soumen Moulik, and Diptendu Sinha Roy. Harsense: Statistical human activity recognition dataset, 2021

work page 2021
[13]

Gaole Dai, Huatao Xu, Hyungjun Yoon, Mo Li, Rui Tan, and Sung-Ju Lee. Contrastsense: Domain-invariant contrastive learning for in-the-wild wearable sensing.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(4):1–32, 2024

work page 2024
[14]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[15]

Simmtm: A simple pre-training framework for masked time-series modeling.Advances in Neural Information Processing Systems, 36:29996–30025, 2023

Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, and Mingsheng Long. Simmtm: A simple pre-training framework for masked time-series modeling.Advances in Neural Information Processing Systems, 36:29996–30025, 2023. 10

work page 2023
[16]

Comparing self-supervised learning techniques for wearable human activity recognition.CCF Transactions on Pervasive Computing and Interaction, 7(3):324–341, 2025

Sannara Ek, Riccardo Presotto, Gabriele Civitarese, François Portet, Philippe Lalanda, and Claudio Bettini. Comparing self-supervised learning techniques for wearable human activity recognition.CCF Transactions on Pervasive Computing and Interaction, 7(3):324–341, 2025

work page 2025
[17]

Time-series representation learning via temporal and contextual contrasting

Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. Time-series representation learning via temporal and contextual contrasting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 2352–2359, 2021

work page 2021
[18]

Wearable technologies for assisted mobility in the real world.Nature Communications, 2025

Shuo Gao, Jianan Chen, Yunjia Xia, Xuemeng Li, Weihao Ma, Huixin Yang, Jinchen Li, Xinkai Zhou, Tianyu Jia, Yuchen Xu, et al. Wearable technologies for assisted mobility in the real world.Nature Communications, 2025

work page 2025
[19]

Scaling laws for neural machine translation

Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, and Colin Cherry. Scaling laws for neural machine translation. InInternational Conference on Learning Representations, 2022

work page 2022
[20]

Imagebind: One embedding space to bind them all

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15180–15190, 2023

work page 2023
[21]

Virtual reality interactions via a user-generic ultrasound human-machine interface for wrist and hand tracking.Nature Commu- nications, 16(1):11062, 2025

Bruno Grandi Sgambato, Bálint K Hodossy, Deren Yusuf Barsakcioglu, Xingchen Yang, Anette Jakob, Marc Fournelle, Meng-Xing Tang, and Dario Farina. Virtual reality interactions via a user-generic ultrasound human-machine interface for wrist and hand tracking.Nature Commu- nications, 16(1):11062, 2025

work page 2025
[22]

Harish Haresamudram, Irfan Essa, and Thomas Plötz. Assessing the state of self-supervised human activity recognition using wearables.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(3):1–47, 2022

work page 2022
[23]

Harish Haresamudram, Chi Ian Tang, Sungho Suh, Paul Lukowicz, and Thomas Ploetz. Past, present, and future of sensor-based human activity recognition using wearables: A surveying tutorial on a still challenging task.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(2):1–44, 2025

work page 2025
[24]

An empirical analysis of compute-optimal large language model training

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katherine Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Oriol Vinyals, Jack William Rae, and Laur...

work page 2022
[25]

Zhiqing Hong, Zelong Li, Shuxin Zhong, Wenjun Lyu, Haotian Wang, Yi Ding, Tian He, and Desheng Zhang. Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(2):1–26, 2024

work page 2024
[26]

Llm4har: Generalizable on-device human activity recognition with pretrained llms

Zhiqing Hong, Yiwei Song, Zelong Li, Anlan Yu, Shuxin Zhong, Yi Ding, Tian He, and Desheng Zhang. Llm4har: Generalizable on-device human activity recognition with pretrained llms. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 4511–4521, 2025

work page 2025
[27]

Experience paper: Nationwide human behavior sensing in last-mile delivery

Zhiqing Hong, Weibing Wang, Anlan Yu, Shuxin Zhong, Haotian Wang, Yi Ding, Tian He, and Desheng Zhang. Experience paper: Nationwide human behavior sensing in last-mile delivery. In Proceedings of the 31st Annual International Conference on Mobile Computing and Networking, pages 682–696, 2025

work page 2025
[28]

Bench- marking classical, deep, and generative models for human activity recognition.arXiv preprint arXiv:2501.08471, 2025

Md Meem Hossain, The Anh Han, Safina Showkat Ara, and Zia Ush Shamszaman. Bench- marking classical, deep, and generative models for human activity recognition.arXiv preprint arXiv:2501.08471, 2025. 11

work page arXiv 2025
[29]

Swl-adapt: An unsupervised domain adaptation model with sample weight learning for cross-user wearable human activity recog- nition

Rong Hu, Ling Chen, Shenghuan Miao, and Xing Tang. Swl-adapt: An unsupervised domain adaptation model with sample weight learning for cross-user wearable human activity recog- nition. InProceedings of the AAAI Conference on artificial intelligence, volume 37, pages 6012–6020, 2023

work page 2023
[30]

Refuseact: Representation fusion using self-supervised learning for activity recognition in next generation networks.Information Fusion, 102:102044, 2024

Sunder Ali Khowaja, Parus Khuwaja, Fayaz Ali Dharejo, Saleem Raza, Ik Hyun Lee, Rizwan Ali Naqvi, and Kapal Dev. Refuseact: Representation fusion using self-supervised learning for activity recognition in next generation networks.Information Fusion, 102:102044, 2024

work page 2024
[31]

Wearable accelerometer-derived physical activity and incident disease.NPJ Digital Medicine, 5(1):131, 2022

Shaan Khurshid, Lu-Chen Weng, Victor Nauffal, James P Pirruccello, Rachael A Venn, Mostafa A Al-Alusi, Emelia J Benjamin, Patrick T Ellinor, and Steven A Lubitz. Wearable accelerometer-derived physical activity and incident disease.NPJ Digital Medicine, 5(1):131, 2022

work page 2022
[32]

Reversible instance normalization for accurate time-series forecasting against distribution shift

Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. InInternational conference on learning representations, 2021

work page 2021
[33]

Soft contrastive learning for time series

Seunghan Lee, Taeyoung Park, and Kibok Lee. Soft contrastive learning for time series. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[34]

Deep transfer learning with graph neural network for sensor-based human activity recognition

Tianzheng Liao, Jinjin Zhao, Yushi Liu, Kamen Ivanov, Jing Xiong, and Yan Yan. Deep transfer learning with graph neural network for sensor-based human activity recognition. In2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2445–2452. IEEE, 2022

work page 2022
[35]

Shengzhong Liu, Tomoyoshi Kimura, Dongxin Liu, Ruijie Wang, Jinyang Li, Suhas Diggavi, Mani Srivastava, and Tarek Abdelzaher. Focal: Contrastive learning for multimodal time- series sensing signals in factorized orthogonal latent space.Advances in Neural Information Processing Systems, 36:47309–47338, 2023

work page 2023
[36]

Aleksej Logacjov. Self-supervised learning for accelerometer-based human activity recogni- tion: A survey.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8(4):1–42, 2024

work page 2024
[37]

Wang Lu, Jindong Wang, Yiqiang Chen, Sinno Jialin Pan, Chunyu Hu, and Xin Qin. Semantic- discriminative mixup for generalizable sensor-based cross-domain activity recognition.Pro- ceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(2):1–19, 2022

work page 2022
[38]

Diversify: A general framework for time series out-of-distribution detection and generalization

Wang Lu, Jindong Wang, Xinwei Sun, Yiqiang Chen, Xiangyang Ji, Qiang Yang, and Xing Xie. Diversify: A general framework for time series out-of-distribution detection and generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(6):4534–4550, 2024

work page 2024
[39]

Harood: A benchmark for out-of-distribution general- ization in sensor-based human activity recognition

Wang Lu, Yao Zhu, and Jindong Wang. Harood: A benchmark for out-of-distribution general- ization in sensor-based human activity recognition. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 2746–2757, 2026

work page 2026
[40]

Mobile sensor data anonymization

Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and Hamed Haddadi. Mobile sensor data anonymization. InProceedings of the international conference on internet of things design and implementation, pages 49–58, 2019

work page 2019
[41]

Overview on wearable sensors for the management of parkinson’s disease.npj Parkinson’s Disease, 9(1):153, 2023

Caroline Moreau, Tiphaine Rouaud, David Grabli, Isabelle Benatru, Philippe Remy, Ana- Raquel Marques, Sophie Drapier, Louise-Laure Mariani, Emmanuel Roze, David Devos, et al. Overview on wearable sensors for the management of parkinson’s disease.npj Parkinson’s Disease, 9(1):153, 2023

work page 2023
[42]

A benchmark for domain adaptation and generalization in smartphone-based human activity recognition.Scientific Data, 11(1):1192, 2024

Otávio Napoli, Dami Duarte, Patrick Alves, Darlinne Hubert Palo Soto, Henrique Evangelista de Oliveira, Anderson Rocha, Levy Boccato, and Edson Borin. A benchmark for domain adaptation and generalization in smartphone-based human activity recognition.Scientific Data, 11(1):1192, 2024. 12

work page 2024
[43]

Tailor, Jacob Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff

Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, shun liao, Jake Garrison, Shyam A. Tailor, Jacob Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff. Scaling wearable foundation models. InThe Thirteenth International Conference on Learning R...

work page 2025
[44]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

Calanet: Cheap all-layer aggregation for human activity recognition.Advances in Neural Information Processing Systems, 37:69419–69444, 2024

Jaegyun Park, Dae-Won Kim, and Jaesung Lee. Calanet: Cheap all-layer aggregation for human activity recognition.Advances in Neural Information Processing Systems, 37:69419–69444, 2024

work page 2024
[46]

Hangwei Qian, Tian Tian, and Chunyan Miao. What makes good contrastive learning on small-scale wearable-based tasks? InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 3761–3771, 2022

work page 2022
[47]

Generalizable low-resource activity recognition with diverse and discriminative representation learning

Xin Qin, Jindong Wang, Shuo Ma, Wang Lu, Yongchun Zhu, Xing Xie, and Yiqiang Chen. Generalizable low-resource activity recognition with diverse and discriminative representation learning. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1943–1953, 2023

work page 1943
[48]

Introducing a new benchmarked dataset for activity monitoring

Attila Reiss and Didier Stricker. Introducing a new benchmarked dataset for activity monitoring. In2012 16th international symposium on wearable computers, pages 108–109. IEEE, 2012

work page 2012
[49]

Transition- aware human activity recognition using smartphones.Neurocomputing, 171:754–767, 2016

Jorge-L Reyes-Ortiz, Luca Oneto, Albert Samà, Xavier Parra, and Davide Anguita. Transition- aware human activity recognition using smartphones.Neurocomputing, 171:754–767, 2016

work page 2016
[50]

Multi-task self-supervised learning for human activity detection.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(2):1–30, 2019

Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. Multi-task self-supervised learning for human activity detection.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(2):1–30, 2019

work page 2019
[51]

Scaling law for time series forecasting.Advances in Neural Information Processing Systems, 37:83314–83344, 2024

Jingzhe Shi, Qinwei Ma, Huan Ma, and Lei Li. Scaling law for time series forecasting.Advances in Neural Information Processing Systems, 37:83314–83344, 2024

work page 2024
[52]

Fusion of smartphone motion sensors for physical activity recognition.Sensors, 14(6):10146– 10176, 2014

Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga. Fusion of smartphone motion sensors for physical activity recognition.Sensors, 14(6):10146– 10176, 2014

work page 2014
[53]

Complex human activity recognition using smartphone and wrist-worn motion sensors.Sensors, 16(4):426, 2016

Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga. Complex human activity recognition using smartphone and wrist-worn motion sensors.Sensors, 16(4):426, 2016

work page 2016
[54]

Ku-har: An open dataset for heterogeneous human activity recognition.Pattern Recognition Letters, 146:46–54, 2021

Niloy Sikder and Abdullah-Al Nahid. Ku-har: An open dataset for heterogeneous human activity recognition.Pattern Recognition Letters, 146:46–54, 2021

work page 2021
[55]

Feel: Quantifying heterogeneity in physiological signals for generalizable emotion recognition

Pragya Singh, Ankush Gupta, Somay Jalan, Mohan Kumar, and Pushpendra Singh. Feel: Quantifying heterogeneity in physiological signals for generalizable emotion recognition. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025

work page 2025
[56]

Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition

Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjær- gaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. InProceedings of the 13th ACM conference on embedded networked sensor systems, pages 127–140, 2015

work page 2015
[57]

A systematic review of smartphone-based human activity recognition methods for health research.NPJ Digital Medicine, 4(1):148, 2021

Marcin Straczkiewicz, Peter James, and Jukka-Pekka Onnela. A systematic review of smartphone-based human activity recognition methods for health research.NPJ Digital Medicine, 4(1):148, 2021

work page 2021
[58]

On-body localization of wearable devices: An investigation of position-aware activity recognition

Timo Sztyler and Heiner Stuckenschmidt. On-body localization of wearable devices: An investigation of position-aware activity recognition. In2016 IEEE international conference on pervasive computing and communications (PerCom), pages 1–9. IEEE, 2016. 13

work page 2016
[59]

Validation of an activity type recognition model classifying daily physical behavior in older adults: the har70+ model.Sensors, 23(5):2368, 2023

Astrid Ustad, Aleksej Logacjov, Stine Øverengen Trollebø, Pernille Thingstad, Beatrix Verei- jken, Kerstin Bach, and Nina Skjæret Maroni. Validation of an activity type recognition model classifying daily physical behavior in older adults: the har70+ model.Sensors, 23(5):2368, 2023

work page 2023
[60]

Optimization-free test-time adaptation for cross-person activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(4):1–27, 2024

Shuoyuan Wang, Jindong Wang, Huajun Xi, Bob Zhang, Lei Zhang, and Hongxin Wei. Optimization-free test-time adaptation for cross-person activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(4):1–27, 2024

work page 2024
[61]

WISDM Smartphone and Smartwatch Activity and Biometrics Dataset

Gary Weiss. WISDM Smartphone and Smartwatch Activity and Biometrics Dataset . UCI Machine Learning Repository, 2019. DOI: https://doi.org/10.24432/C5HK59

work page doi:10.24432/c5hk59 2019
[62]

Generalizable sensor- based activity recognition via categorical concept invariant learning

Di Xiong, Shuoyuan Wang, Lei Zhang, Wenbo Huang, and Chaolei Han. Generalizable sensor- based activity recognition via categorical concept invariant learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 923–931, 2025

work page 2025
[63]

Experience paper: Adopting activity recognition in on-demand food delivery business

Huatao Xu, Yan Zhang, Wei Gao, Guobin Shen, and Mo Li. Experience paper: Adopting activity recognition in on-demand food delivery business. InProceedings of the 31st Annual International Conference on Mobile Computing and Networking, pages 1015–1028, 2025

work page 2025
[64]

Practically adopting human activity recogni- tion

Huatao Xu, Pengfei Zhou, Rui Tan, and Mo Li. Practically adopting human activity recogni- tion. InProceedings of the 29th Annual International Conference on Mobile Computing and Networking, pages 1–15, 2023

work page 2023
[65]

Limu-bert: Unleashing the potential of unlabeled data for imu sensing applications

Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. Limu-bert: Unleashing the potential of unlabeled data for imu sensing applications. InProceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, pages 220–233, 2021

work page 2021
[66]

Relcon: Relative contrastive learning for a motion foundation model for wearable data

Maxwell A Xu, Jaya Narain, Gregory Darnell, Haraldur T Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Andres Fineman, Karthik Jayaraman Raghuram, James Matthew Rehg, and Shirley You Ren. Relcon: Relative contrastive learning for a motion foundation model for wearable data. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[67]

Mobhar: source-free knowledge transfer for human activity recognition on mobile devices

Meng Xue, Yinan Zhu, Wentao Xie, Zhixian Wang, Yanjiao Chen, Kui Jiang, and Qian Zhang. Mobhar: source-free knowledge transfer for human activity recognition on mobile devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(1):1– 24, 2025

work page 2025
[68]

Takahiro Yamane, Moeka Kimura, and Mizuki Morita. Impact of sensor-axis combinations on machine learning accuracy for human activity recognition using accelerometer data in clinical settings.Physical Activity and Health, 9(1), 2025

work page 2025
[69]

Large language model-guided semantic alignment for human activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(4):1–25, 2025

Hua Yan, Heng Tan, Yi Ding, Pengfei Zhou, Vinod Namboodiri, and Yu Yang. Large language model-guided semantic alignment for human activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9(4):1–25, 2025

work page 2025
[70]

Self-supervised learning for human activity recognition using 700,000 person-days of wearable data.NPJ digital medicine, 7(1):91, 2024

Hang Yuan, Shing Chan, Andrew P Creagh, Catherine Tong, Aidan Acquah, David A Clifton, and Aiden Doherty. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data.NPJ digital medicine, 7(1):91, 2024

work page 2024
[71]

Ts2vec: Towards universal representation of time series

Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series. InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 8980–8987, 2022

work page 2022
[72]

MoPFormer: Motion- primitive transformer for wearable-sensor activity recognition

Hao Zhang, Zhan Zhuang, Xuehao Wang, Xiaodong Yang, and Yu Zhang. MoPFormer: Motion- primitive transformer for wearable-sensor activity recognition. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[73]

Usc-had: A daily activity dataset for ubiquitous activity recognition using wearable sensors

Mi Zhang and Alexander A Sawchuk. Usc-had: A daily activity dataset for ubiquitous activity recognition using wearable sensors. InProceedings of the 2012 ACM conference on ubiquitous computing, pages 1036–1043, 2012. 14

work page 2012
[74]

Self-supervised time series repre- sentation learning via cross reconstruction transformer.IEEE Transactions on Neural Networks and Learning Systems, 35(11):16129–16138, 2023

Wenrui Zhang, Ling Yang, Shijia Geng, and Shenda Hong. Self-supervised time series repre- sentation learning via cross reconstruction transformer.IEEE Transactions on Neural Networks and Learning Systems, 35(11):16129–16138, 2023

work page 2023
[75]

Unimts: Unified pre-training for motion time series.Advances in Neural Information Processing Systems, 37:107469–107493, 2024

Xiyuan Zhang, Diyan Teng, Ranak R Chowdhury, Shuheng Li, Dezhi Hong, Rajesh K Gupta, and Jingbo Shang. Unimts: Unified pre-training for motion time series.Advances in Neural Information Processing Systems, 37:107469–107493, 2024

work page 2024
[76]

Towards open respiratory acoustic foundation models: Pretraining and benchmarking.Advances in Neural Information Processing Systems, 37:27024–27055, 2024

Yuwei Zhang, Tong Xia, Jing Han, Yu Y Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, and Cecilia Mascolo. Towards open respiratory acoustic foundation models: Pretraining and benchmarking.Advances in Neural Information Processing Systems, 37:27024–27055, 2024

work page 2024
[77]

One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023

Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al. One fits all: Power general time series analysis by pretrained lm.Advances in neural information processing systems, 36:43322–43355, 2023. 15 A Benchmark Dataset A.1 Datasets Overview We detail the information for the 14 datasets employed in our benchmark. As we focus on both accelerometer and gyroscope ...

work page 2023