AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

Baiyu Chen; Benjamin Tag; Flora Salim; Hao Xue; Lihuan Li; Wilson Wongso; Xiachong Lin; Zechen Li

arxiv: 2605.22715 · v1 · pith:6C5D5ZRAnew · submitted 2026-05-21 · 💻 cs.CV · cs.AI· cs.CL· cs.HC

AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

Baiyu Chen , Zechen Li , Wilson Wongso , Lihuan Li , Xiachong Lin , Hao Xue , Benjamin Tag , Flora Salim This is my paper

Pith reviewed 2026-05-22 06:21 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CLcs.HC

keywords setup-agnostic modelingIMU simulationhuman motionzero-shot learningwearable sensorsactivity recognitionmotion captioninggraph encoder

0 comments

The pith

AnyMo learns setup-independent motion representations by simulating IMU signals from dense body placements and aligning them with language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AnyMo to handle the fact that inertial signals from wearables vary strongly with body location, mounting, orientation, hardware, and sampling. It generates many plausible synthetic signals through physics-based simulation across numerous points on the body surface. These signals train a graph encoder to reconstruct full motion from partial masked views, after which the encoder produces motion tokens that are aligned with a large language model. The resulting system supports zero-shot activity recognition, retrieval between IMU and text, and motion captioning. A reader would care because this could let everyday wearables deliver consistent motion analysis without retraining for each new device or placement.

Core claim

AnyMo is a geometry-aware framework that uses physics-grounded IMU simulation over dense body-surface placements to generate diverse and plausible synthetic signals, pre-trains a graph encoder from paired synthetic placement views and masked partial observations, tokenizes multi-position IMU into full-body motion tokens, and aligns these tokens with an LLM for motion-language understanding, yielding gains of 11.7%/11.6%/22.6% in Accuracy/F1/R@2 on HAR, 15.9% and 28.6% MRR lifts in zero-shot retrieval, and 18.8% BERT-F1 in zero-shot captioning across 14 unseen datasets.

What carries the argument

Physics-grounded IMU simulation over dense body-surface placements that generates diverse synthetic signals used to pre-train a graph encoder on paired placement views and masked observations before tokenization and LLM alignment.

If this is right

Zero-shot activity recognition achieves 11.7% higher accuracy and 11.6% higher F1 across 14 previously unseen datasets.
Zero-shot cross-modal retrieval reaches 15.9% higher MRR for IMU-to-text and 28.6% higher MRR for text-to-IMU.
Zero-shot motion captioning from wearable IMU data improves by 18.8% in BERT-F1 score.
The same pre-trained tokens support generalist motion-language tasks without dataset-specific fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dense-placement simulation strategy could be adapted to other wearable modalities such as pressure or temperature sensors.
Combining the motion tokens with camera or audio streams might produce more robust real-time activity tracking systems.
Long-term deployment on consumer devices could reveal whether the learned representations remain stable over weeks of continuous use.

Load-bearing premise

Physics-grounded IMU simulation over dense body-surface placements generates diverse and plausible synthetic signals that successfully transfer to real-world wearable data across varying body locations, mounting positions, and hardware.

What would settle it

Measuring whether performance gains vanish when the trained model is tested on IMU data from a new body location, mounting orientation, or sensor hardware whose signal statistics were never included in the simulation.

Figures

Figures reproduced from arXiv: 2605.22715 by Baiyu Chen, Benjamin Tag, Flora Salim, Hao Xue, Lihuan Li, Wilson Wongso, Xiachong Lin, Zechen Li.

**Figure 2.** Figure 2: Physics-grounded geometryaware motion simulation. To define a consistent in-surface direction, we set an anatomical axis ui from ci toward the centroid of its nearest available child segment in the body kinematic tree, or along the opposite direction from its nearest available parent when no child segment is available. For each vertex v ∈ Vi , we compute a surface normal ni,v from the template mesh fac… view at source ↗

**Figure 3.** Figure 3: Details of (1) Geometry-Aware Pre-Training, (2) Full-Body IMU Tokenization and (3) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Details of Contrastive Instruction Tuning (left) and inference phases (right) of AnyMo. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative Results of Wearable IMU Motion Caption Generation. We use green to highlight [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: UMAP visualization of paired real and synthetic IMU embeddings for ten activity categories. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: More details of Masked IMU Tokenization and Motion Language Model Pre-Training. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Prompt templates used for motion-language pre-training, instruction tuning, contrastive [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

read the original abstract

As wearable and mobile devices become increasingly embedded in daily life, they offer a practical way to continuously sense human motion in the wild. But inertial signals are highly dependent on the sensing setup, including body location, mounting position, sensor orientation, device hardware, and sampling protocol. This setup dependence makes it difficult to learn motion representations that transfer across devices and datasets, and limits the broader use of wearable IMUs beyond closed-set recognition. We introduce AnyMo, a geometry-aware framework for setup-agnostic human motion modeling. AnyMo uses physics-grounded IMU simulation over dense body-surface placements to generate diverse and plausible synthetic signals, pre-trains a graph encoder from paired synthetic placement views and masked partial observations, tokenizes multi-position IMU into full-body motion tokens, and aligns these tokens with an LLM for motion-language understanding. We evaluate AnyMo on three complementary tasks: zero-shot activity recognition across 14 unseen downstream datasets, cross-modal retrieval, and wearable IMU motion captioning, where it improves average Accuracy/F1/R@2 by 11.7\%/11.6\%/22.6\% on HAR, increases zero-shot IMU-to-text and text-to-IMU retrieval MRR by 15.9\% and 28.6\%, respectively, and improves zero-shot captioning BERT-F1 by 18.8\%. These results support AnyMo as a generalist model for wearable motion understanding in the wild. Project page: https://baiyuchen.com/project/AnyMo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AnyMo's dense physics simulation plus graph pretraining and LLM alignment for setup-agnostic IMU motion is a fresh pipeline, but the sim-to-real transfer still lacks direct checks.

read the letter

The main thing to know is that AnyMo tackles IMU setup dependence by simulating physics-based signals over dense body-surface placements, pre-training a graph encoder on paired synthetic views and masked observations, tokenizing the multi-position data into motion tokens, and aligning those tokens with an LLM for language tasks. That full stack for arbitrary device locations and hardware is not something described in prior work on wearable motion modeling.

Referee Report

2 major / 2 minor

Summary. The paper introduces AnyMo, a geometry-aware framework for setup-agnostic human motion modeling from wearable IMUs. It generates synthetic IMU signals via physics-grounded simulation over dense body-surface placements, pre-trains a graph encoder on paired synthetic views and masked observations, tokenizes multi-position IMU data into full-body motion tokens, and aligns these with an LLM for motion-language tasks. The work evaluates on zero-shot activity recognition across 14 unseen datasets, cross-modal retrieval, and IMU motion captioning, reporting average gains of 11.7%/11.6%/22.6% on HAR metrics, 15.9%/28.6% MRR improvements on retrieval, and 18.8% BERT-F1 on captioning.

Significance. If the simulation-to-real transfer holds and the gains are attributable to the proposed components rather than other factors, AnyMo could provide a practical path toward generalist models for wearable motion understanding that generalize across body locations, mounting positions, and hardware. The use of dense physics-based synthetic data for pre-training and the multi-task zero-shot evaluation on unseen datasets represent a strength in addressing setup dependence, which is a known barrier in the field.

major comments (2)

[§3.1] §3.1 (Physics-Grounded IMU Simulation): The central claim that dense body-surface physics simulation produces diverse, plausible signals that successfully transfer to real-world wearable data across varying placements and hardware lacks direct quantitative fidelity validation. No metrics such as signal distribution distances, noise spectra comparisons, or domain discrepancy measures (e.g., MMD or Wasserstein distance) between simulated and real traces at matched positions are reported. This is load-bearing because the observed gains on the 14 unseen datasets could arise from the graph encoder, tokenization, or LLM alignment rather than the simulation itself.
[§5] §5 (Evaluation and Baselines): The abstract and results claim substantial improvements (e.g., 11.7% Accuracy on HAR) but provide insufficient details on baseline implementations, statistical significance testing, dataset characteristics, or explicit controls for simulation-to-real gap. Without these, it is difficult to attribute gains specifically to the geometry-aware components versus other modeling choices.

minor comments (2)

[§3.2] Notation for graph encoder inputs and tokenization could be clarified with an explicit diagram or equation set showing how partial observations map to full-body tokens.
[Figure 4] Figure captions for qualitative results should include more detail on the specific body placements and hardware variations shown.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying our approach and outlining planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3.1] §3.1 (Physics-Grounded IMU Simulation): The central claim that dense body-surface physics simulation produces diverse, plausible signals that successfully transfer to real-world wearable data across varying placements and hardware lacks direct quantitative fidelity validation. No metrics such as signal distribution distances, noise spectra comparisons, or domain discrepancy measures (e.g., MMD or Wasserstein distance) between simulated and real traces at matched positions are reported. This is load-bearing because the observed gains on the 14 unseen datasets could arise from the graph encoder, tokenization, or LLM alignment rather than the simulation itself.

Authors: We acknowledge that the manuscript does not include direct quantitative fidelity metrics such as MMD or Wasserstein distance between simulated and real IMU traces at matched positions. Our validation of the simulation relied primarily on the strong zero-shot generalization across 14 unseen real-world datasets, which we interpret as indirect evidence of successful transfer given the physics-grounded modeling. To directly address this concern and better isolate the simulation's contribution, we will add quantitative domain discrepancy analyses (including MMD and Wasserstein distances) as well as noise spectra comparisons for representative positions in the revised version. revision: yes
Referee: [§5] §5 (Evaluation and Baselines): The abstract and results claim substantial improvements (e.g., 11.7% Accuracy on HAR) but provide insufficient details on baseline implementations, statistical significance testing, dataset characteristics, or explicit controls for simulation-to-real gap. Without these, it is difficult to attribute gains specifically to the geometry-aware components versus other modeling choices.

Authors: We agree that additional experimental details would improve clarity and help attribute gains more precisely. In the revision, we will expand the evaluation section and supplementary material to provide: explicit descriptions of baseline implementations and adaptations, results of statistical significance testing (e.g., p-values across multiple runs), summary characteristics for all 14 datasets, and further controls/ablation studies that isolate the contribution of the geometry-aware simulation and pre-training from other modeling choices such as tokenization or LLM alignment. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent evaluation

full rationale

The paper presents AnyMo as a framework that generates synthetic IMU signals via physics-grounded simulation over dense placements, pre-trains a graph encoder on paired views and masked observations, tokenizes multi-position data into motion tokens, and aligns them with an LLM. All reported gains (e.g., +11.7% Accuracy on HAR across 14 unseen datasets, improved retrieval MRR and captioning BERT-F1) are stated as outcomes of empirical evaluation on downstream tasks rather than any mathematical derivation, fitted parameter renamed as prediction, or self-citation chain. No equations appear in the provided text, and the central claims rest on external dataset performance instead of reducing to inputs by construction. This is a standard self-contained empirical contribution with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The physics simulation and transfer assumption are implicit but not formalized.

pith-pipeline@v0.9.0 · 5833 in / 1242 out tokens · 39989 ms · 2026-05-22T06:21:37.514900+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 7 internal anchors

[1]

Aggarwal and M.S

J.K. Aggarwal and M.S. Ryoo. Human activity analysis: A review.ACM Comput. Surv., 43(3), April 2011. ISSN 0360-0300. doi: 10.1145/1922649.1922653. 10

work page doi:10.1145/1922649.1922653 2011
[2]

Comparative study on classifying human activities with miniature inertial and magnetic sensors.Pattern Recognition, 43(10):3605–3620, 2010

Kerem Altun, Billur Barshan, and Orkun Tunçel. Comparative study on classifying human activities with miniature inertial and magnetic sensors.Pattern Recognition, 43(10):3605–3620, 2010

work page 2010
[3]

A public domain dataset for human activity recognition using smartphones

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz, et al. A public domain dataset for human activity recognition using smartphones. InEsann, volume 3, pages 3–4, 2013

work page 2013
[4]

Llasa: A sensor-aware llm for natural language reasoning of human activity from imu data

Sheikh Asif Imran Shouborno, Mohammad Nur Hossain Khan, Subrata Biswas, and Bashima Islam. Llasa: A sensor-aware llm for natural language reasoning of human activity from imu data. InCompanion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp Companion ’25, page 893–899, New York, NY , USA, 2025. Association for...

work page doi:10.1145/3714394.3756187 2025
[5]

w-har: An activity recognition dataset and framework using low-power wearable devices.Sensors, 20(18):5356, 2020

Ganapati Bhat, Nicholas Tran, Holly Shill, and Umit Y Ogras. w-har: An activity recognition dataset and framework using low-power wearable devices.Sensors, 20(18):5356, 2020

work page 2020
[6]

Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook

Sizhen Bian, Mengxi Liu, Siyu Yuan, Lala Shakti Swarup Ray, Bo Zhou, Bin Guo, Zhiwen Yu, Thomas Ploetz, Paul Lukowicz, and Vitor Fortes Rey. Foundation models defining a new era in sensor-based human activity recognition: A survey and outlook.arXiv preprint arXiv:2604.02711, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[7]

A tutorial on human activity recognition using body-worn inertial sensors.ACM Comput

Andreas Bulling, Ulf Blanke, and Bernt Schiele. A tutorial on human activity recognition using body-worn inertial sensors.ACM Comput. Surv., 46(3), January 2014. ISSN 0360-0300. doi: 10.1145/2499621

work page doi:10.1145/2499621 2014
[8]

Towards generalizable human activity recognition: A survey.arXiv preprint arXiv:2508.12213, 2025

Yize Cai, Baoshen Guo, Flora Salim, and Zhiqing Hong. Towards generalizable human activity recognition: A survey.arXiv preprint arXiv:2508.12213, 2025

work page arXiv 2025
[9]

A systematic study of unsupervised domain adaptation for robust human-activity recognition.Proc

Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. A systematic study of unsupervised domain adaptation for robust human-activity recognition.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 4(1), March 2020. doi: 10.1145/3380985

work page doi:10.1145/3380985 2020
[10]

COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition

Baiyu Chen, Wilson Wongso, Zechen Li, Yonchanok Khaokaew, Hao Xue, and Flora Salim. Co- modo: Cross-modal video-to-imu distillation for efficient egocentric human activity recognition. arXiv preprint arXiv:2503.07259, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor

Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In2015 IEEE International conference on image processing (ICIP), pages 168–172. IEEE, 2015

work page 2015
[12]

Understanding and using context.Personal and ubiquitous computing, 5(1):4–7, 2001

Anind K Dey. Understanding and using context.Personal and ubiquitous computing, 5(1):4–7, 2001

work page 2001
[13]

Mantisv2: Closing the zero-shot gap in time series classification with synthetic data and test-time strategies

Vasilii Feofanov, Songkang Wen, Jianfeng Zhang, Lujia Pan, and Ievgen Redko. Mantisv2: Closing the zero-shot gap in time series classification with synthetic data and test-time strategies. arXiv preprint arXiv:2602.17868, 2026

work page arXiv 2026
[14]

Learning from the best: Contrastive representations learning across sensor locations for wearable activity recognition

Vitor Fortes Rey, Sungho Suh, and Paul Lukowicz. Learning from the best: Contrastive representations learning across sensor locations for wearable activity recognition. ISWC ’22, page 28–32, New York, NY , USA, 2022. Association for Computing Machinery. ISBN 9781450394246. doi: 10.1145/3544794.3558464

work page doi:10.1145/3544794.3558464 2022
[15]

Scaling deep contrastive learning batch size under memory limited setup

Luyu Gao, Yunyi Zhang, Jiawei Han, and Jamie Callan. Scaling deep contrastive learning batch size under memory limited setup. In Anna Rogers, Iacer Calixto, Ivan Vuli´c, Naomi Saphra, Nora Kassner, Oana-Maria Camburu, Trapit Bansal, and Vered Shwartz, editors,Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 316–321...

work page doi:10.18653/v1/2021.repl4nlp-1.31 2021
[16]

ImageBind: One embedding space to bind them all

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. ImageBind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180–15190, 2023. 11

work page 2023
[17]

Gemma 4 Model Card

Google DeepMind. Gemma 4 Model Card. Technical report, April 2026. Last updated: 2026-04-17

work page 2026
[18]

Ego4D: Around the world in 3,000 hours of egocentric video

Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, et al. Ego4D: Around the world in 3,000 hours of egocentric video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18995–19012, 2022

work page 2022
[19]

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyl- los Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, et al. Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1...

work page 2024
[20]

Past, present, and future of sensor-based human activity recognition using wearables: A surveying tutorial on a still challenging task.Proc

Harish Haresamudram, Chi Ian Tang, Sungho Suh, Paul Lukowicz, and Thomas Plötz. Past, present, and future of sensor-based human activity recognition using wearables: A surveying tutorial on a still challenging task.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 9 (2), June 2025. doi: 10.1145/3729467

work page doi:10.1145/3729467 2025
[21]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[22]

{DEBERTA}: {DECODING}- {enhanced} {bert} {with} {disentangled} {attention}

Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. {DEBERTA}: {DECODING}- {enhanced} {bert} {with} {disentangled} {attention}. InInternational Conference on Learning Representations, 2021. URLhttps://openreview.net/forum?id=XPZIaotutsD

work page 2021
[23]

Egolm: Multi-modal language model of egocentric motions

Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim, Yuting Ye, Richard Newcombe, Ziwei Liu, and Lingni Ma. Egolm: Multi-modal language model of egocentric motions. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5344–5354, 2025

work page 2025
[24]

Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining.Proc

Zhiqing Hong, Zelong Li, Shuxin Zhong, Wenjun Lyu, Haotian Wang, Yi Ding, Tian He, and Desheng Zhang. Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 8(2), May 2024. doi: 10.1145/3659597

work page doi:10.1145/3659597 2024
[25]

Graphmae2: A decoding-enhanced masked self-supervised graph learner

Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, and Jie Tang. Graphmae2: A decoding-enhanced masked self-supervised graph learner. InProceedings of the ACM web conference 2023, pages 737–746, 2023

work page 2023
[26]

Graph contrastive learning for skeleton-based action recognition

Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, and Bin Feng. Graph contrastive learning for skeleton-based action recognition. InThe Eleventh International Conference on Learning Representations,

work page
[27]

URLhttps://openreview.net/forum?id=PLUXnnxUdr4

work page
[28]

Sijie Ji, Xinzhe Zheng, and Chenshu Wu. HARGPT: Are LLMs zero-shot human activity recognizers? In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys), pages 38–43, 2024. doi: 10.1109/FMSys62467.2024. 00011

work page doi:10.1109/fmsys62467.2024 2024
[29]

Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36:20067–20079, 2023

Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, and Tao Chen. Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36:20067–20079, 2023

work page 2023
[30]

Motionbind: Multi-modal human motion alignment for re- trieval, recognition, and generation

Kaleab A Kinfu and Rene Vidal. Motionbind: Multi-modal human motion alignment for re- trieval, recognition, and generation. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems, 2025. URLhttps://openreview.net/forum?id=sUjwDdyspc

work page 2025
[31]

Enabling sustainability and energy awareness in schools based on iot and real-world data,

Kai Kunze and Paul Lukowicz. Sensor placement variations in wearable activity recognition. IEEE Pervasive Computing, 13(4):32–41, 2014. doi: 10.1109/MPRV .2014.73. 12

work page doi:10.1109/mprv 2014
[32]

NV-embed: Improved techniques for training LLMs as generalist embedding models

Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. NV-embed: Improved techniques for training LLMs as generalist embedding models. InThe Thirteenth International Conference on Learning Representations,

work page
[33]

URLhttps://openreview.net/forum?id=lgsyLSsDRe

work page
[34]

Generating virtual on-body accelerometer data from virtual textual descriptions for human activity recognition

Zikang Leng, Hyeokhyen Kwon, and Thomas Ploetz. Generating virtual on-body accelerometer data from virtual textual descriptions for human activity recognition. InProceedings of the 2023 ACM International Symposium on Wearable Computers, ISWC ’23, page 39–43, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400701993. doi: 10.1145/35...

work page doi:10.1145/3594738.3611361 2023
[35]

Imugpt 2.0: Language-based cross modality transfer for sensor-based human activity recognition.Proc

Zikang Leng, Amitrajit Bhattacharjee, Hrudhai Rajasekhar, Lizhe Zhang, Elizabeth Bruda, Hyeokhyen Kwon, and Thomas Plötz. Imugpt 2.0: Language-based cross modality transfer for sensor-based human activity recognition.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 8(3), September 2024. doi: 10.1145/3678545

work page doi:10.1145/3678545 2024
[36]

3d human action representation learning via cross-view consistency pursuit

Linguo Li, Minsi Wang, Bingbing Ni, Hang Wang, Jiancheng Yang, and Wenjun Zhang. 3d human action representation learning via cross-view consistency pursuit. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4741–4750, 2021

work page 2021
[37]

ZARA: Training-Free Motion Time-Series Reasoning via Evidence-Grounded LLM Agents

Zechen Li, Baiyu Chen, Hao Xue, and Flora D Salim. Zara: Training-free motion time-series reasoning via evidence-grounded llm agents.arXiv preprint arXiv:2508.04038, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, and Flora D. Salim. SensorLLM: Aligning large language models with motion sensors for human activity recognition. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, edi- tors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Pro- cessing, pages 354–...

work page doi:10.18653/v1/2025.emnlp-main.19 2025
[39]

Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition

Lilang Lin, Jiahang Zhang, and Jiaying Liu. Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2363–2372, 2023

work page 2023
[40]

Gpt understands, too.AI open, 5:208–215, 2024

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. Gpt understands, too.AI open, 5:208–215, 2024

work page 2024
[41]

Toward foundation model for multivariate wearable sensing of physiological signals.ACM Trans

Yunfei Luo, Yuliang Chen, Asif Salekin, and Tauhidur Rahman. Toward foundation model for multivariate wearable sensing of physiological signals.ACM Trans. Comput. Healthcare, March 2026. doi: 10.1145/3803808

work page doi:10.1145/3803808 2026
[42]

Nymeria: A massive collection of multimodal egocentric daily motion in the wild

Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, et al. Nymeria: A massive collection of multimodal egocentric daily motion in the wild. InEuropean Conference on Computer Vision, pages 445–465. Springer, 2024

work page 2024
[43]

Masked motion predictors are strong 3d action representation learners

Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, and Houqiang Li. Masked motion predictors are strong 3d action representation learners. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10181–10191, 2023

work page 2023
[44]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018. URL https: //arxiv.org/abs/1802.03426

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

Wonderwall: A virtual-to-real foundation model for imu- based har.Proc

Shenghuan Miao and Ling Chen. Wonderwall: A virtual-to-real foundation model for imu- based har.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 10(1), March 2026. doi: 10.1145/3789688

work page doi:10.1145/3789688 2026
[46]

IMU2CLIP: Language-grounded motion sensor translation with multimodal con- trastive learning

Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Aparajita Saraf, Amy Bearman, and Babak Damavandi. IMU2CLIP: Language-grounded motion sensor translation with multimodal con- trastive learning. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Associ- ation for Computational Linguistics: EMNLP 2023, pages 13246–13253, Singapore, December 13

work page 2023
[47]

doi: 10.18653/v1/2023.findings-emnlp.883

Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.883. URLhttps://aclanthology.org/2023.findings-emnlp.883/

work page doi:10.18653/v1/2023.findings-emnlp.883 2023
[48]

Generative representational instruction tuning

Niklas Muennighoff, Hongjin SU, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, and Douwe Kiela. Generative representational instruction tuning. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview. net/forum?id=BC4lIvfSzv

work page 2025
[49]

Mobind: Motion binding for fine-grained imu-video pose alignment.arXiv preprint arXiv:2602.19004, 2026

Duc Duy Nguyen, Tat-Jun Chin, and Minh Hoai. Mobind: Motion binding for fine-grained imu-video pose alignment.arXiv preprint arXiv:2602.19004, 2026

work page arXiv 2026
[50]

Wimusim: simulat- ing realistic variabilities in wearable imus for human activity recognition.Frontiers in Computer Science, V olume 7 - 2025, 2025

Nobuyuki Oishi, Phil Birch, Daniel Roggen, and Paula Lago. Wimusim: simulat- ing realistic variabilities in wearable imus for human activity recognition.Frontiers in Computer Science, V olume 7 - 2025, 2025. ISSN 2624-9898. doi: 10.3389/fcomp. 2025.1514933. URL https://www.frontiersin.org/journals/computer-science/ articles/10.3389/fcomp.2025.1514933

work page doi:10.3389/fcomp 2025
[51]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI. gpt-oss-120b & gpt-oss-20b Model Card, 2025. URL https://arxiv.org/abs/ 2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

GPT-5.4 Thinking System Card

OpenAI. GPT-5.4 Thinking System Card. Technical report, OpenAI, March 2026. URL https: //deploymentsafety.openai.com/gpt-5-4-thinking/gpt-5-4-thinking.pdf . Sys- tem card

work page 2026
[53]

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Francisco Javier Ordóñez and Daniel Roggen. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition.Sensors, 16(1), 2016. ISSN 1424-8220. doi: 10.3390/s16010115. URLhttps://www.mdpi.com/1424-8220/16/1/115

work page doi:10.3390/s16010115 2016
[54]

Qwen2.5 Technical Report

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[55]

W2w: A simulated exploration of imu placement across the human body for designing smarter wearable

Lala Shakti Swarup Ray, Bo Zhou, and Paul Lukowicz. W2w: A simulated exploration of imu placement across the human body for designing smarter wearable. InProceedings of the 2025 ACM International Symposium on Wearable Computers, ISWC ’25, page 170–176, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400714818. doi: 10.1145/3715071.3750417

work page doi:10.1145/3715071.3750417 2025
[56]

Introducing a new benchmarked dataset for activity monitoring

Attila Reiss and Didier Stricker. Introducing a new benchmarked dataset for activity monitoring. In2012 16th international symposium on wearable computers, pages 108–109. IEEE, 2012

work page 2012
[57]

Collecting complex activity datasets in highly rich networked sensor environments

Daniel Roggen, Alberto Calatroni, Mirco Rossi, Thomas Holleczek, Kilian Förster, Gerhard Tröster, Paul Lukowicz, David Bannach, Gerald Pirkl, Alois Ferscha, et al. Collecting complex activity datasets in highly rich networked sensor environments. In2010 Seventh international conference on networked sensing systems (INSS), pages 233–240. IEEE, 2010

work page 2010
[58]

Human motion.Understanding, Modeling, Capture, 2008

Bodo Rosenhahn, Reinhard Klette, and Dimitris Metaxas. Human motion.Understanding, Modeling, Capture, 2008

work page 2008
[59]

Context-aware computing applications

Bill Schilit, Norman Adams, and Roy Want. Context-aware computing applications. In1994 first workshop on mobile computing systems and applications, pages 85–90. IEEE, 1994

work page 1994
[60]

InProceedings of the 13th ACM Conference on Embedded Networked Sensor Systems(Seoul, South Korea)(Sen- Sys ’15)

Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjær- gaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. Smart devices are different: Assess- ing and mitigatingmobile sensing heterogeneities for activity recognition. InProceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, SenSys ’15, page 127–...

work page doi:10.1145/2809695.2809718 2015
[61]

A systematic review of smartphone-based human activity recognition methods for health research.npj Digital Medicine, 4(1):148, 2021

Marcin Straczkiewicz, Peter James, and Jukka-Pekka Onnela. A systematic review of smartphone-based human activity recognition methods for health research.npj Digital Medicine, 4(1):148, 2021. ISSN 2398-6352. doi: 10.1038/s41746-021-00514-4. URL https://doi.org/10.1038/s41746-021-00514-4

work page doi:10.1038/s41746-021-00514-4 2021
[62]

Imuzero: Zero-shot human activity recognition by language-based cross modality fusion.Proc

Jie Su, Fengtong Ge, Zhenyu Wen, Taotao Li, Yang Bai, Yejian Zhou, and Xiaoqin Zhang. Imuzero: Zero-shot human activity recognition by language-based cross modality fusion.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 9(4), December 2025. doi: 10.1145/ 3770669

work page 2025
[63]

On-body localization of wearable devices: An investigation of position-aware activity recognition

Timo Sztyler and Heiner Stuckenschmidt. On-body localization of wearable devices: An investigation of position-aware activity recognition. In2016 IEEE international conference on pervasive computing and communications (PerCom), pages 1–9. IEEE, 2016

work page 2016
[64]

On-body localization of wearable devices: An investigation of position-aware activity recognition

Timo Sztyler and Heiner Stuckenschmidt. On-body localization of wearable devices: An investigation of position-aware activity recognition. In2016 IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 1–9, 2016. doi: 10.1109/ PERCOM.2016.7456521

work page arXiv 2016
[65]

Yue Tan, Xiaoqian Hu, Hao Xue, Celso M de Melo, and Flora D. Salim. Bisecle: Binding and separation in continual learning for video language understanding. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview. net/forum?id=o6keqobP13

work page 2025
[66]

Catherine Tong, Jinchen Ge, and Nicholas D. Lane. Zero-shot learning for imu-based activity recognition using video embeddings.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 5(4), December 2022. doi: 10.1145/3494995

work page doi:10.1145/3494995 2022
[67]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[68]

Ubiphysio: Support daily functioning, fitness, and rehabilitation with action understanding and feedback in natural language.Proc

Chongyang Wang, Yuan Feng, Lingxiao Zhong, Siyi Zhu, Chi Zhang, Siqi Zheng, Chen Liang, Yuntao Wang, Chengqi He, Chun Yu, and Yuanchun Shi. Ubiphysio: Support daily functioning, fitness, and rehabilitation with action understanding and feedback in natural language.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 8(1), March 2024. doi: 10.1145/3643552

work page doi:10.1145/3643552 2024
[69]

Ego4o: Egocentric human motion capture and understanding from multi-modal input

Jian Wang, Rishabh Dabral, Diogo Luvizon, Zhe Cao, Lingjie Liu, Thabo Beeler, and Christian Theobalt. Ego4o: Egocentric human motion capture and understanding from multi-modal input. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22668–22679, 2025

work page 2025
[70]

One model to fit them all: Universal imu-based human activity recognition with llm-assisted cross-dataset representation.Proc

Qingxin Wei, Jiaming Huang, Yi Gao, and Wei Dong. One model to fit them all: Universal imu-based human activity recognition with llm-assisted cross-dataset representation.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 9(3), September 2025. doi: 10.1145/3749509

work page doi:10.1145/3749509 2025
[71]

Gary M Weiss. Wisdm smartphone and smartwatch activity and biometrics dataset.UCI Machine Learning Repository: WISDM Smartphone and Smartwatch Activity and Biometrics Dataset Data Set, 7(133190-133202):5, 2019

work page 2019
[72]

Towards continual egocentric activity recognition: A multi-modal egocentric activity dataset for continual learning.IEEE Transactions on Multimedia, 26: 2430–2443, 2024

Linfeng Xu, Qingbo Wu, Lili Pan, Fanman Meng, Hongliang Li, Chiyuan He, Hanxin Wang, Shaoxu Cheng, and Yu Dai. Towards continual egocentric activity recognition: A multi-modal egocentric activity dataset for continual learning.IEEE Transactions on Multimedia, 26: 2430–2443, 2024. doi: 10.1109/TMM.2023.3295899

work page doi:10.1109/tmm.2023.3295899 2024
[73]

Relcon: Relative contrastive learning for a motion foundation model for wearable data

Maxwell A Xu, Jaya Narain, Gregory Darnell, Haraldur T Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Andres Fineman, Karthik Jayaraman Raghuram, James Matthew Rehg, and Shirley You Ren. Relcon: Relative contrastive learning for a motion foundation model for wearable data. InThe Thirteenth International Conference on Learning Representations, 2025. URL...

work page 2025
[74]

Skeletonmae: graph- based masked autoencoder for skeleton sequence pre-training

Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, and Liang Lin. Skeletonmae: graph- based masked autoencoder for skeleton sequence pre-training. InProceedings of the IEEE/CVF international conference on computer vision, pages 5606–5618, 2023. 15

work page 2023
[75]

Spatial temporal graph convolutional networks for skeleton-based action recognition

Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[76]

Tnda-har, 2021

Yan Yan, Dali Chen, Yushi Liu, Jinjin Zhao, Bo Wang, Xuankun Wu, Xiaohao Jiao, Yuqian Chen, Huihui Li, and Xuchao Ren. Tnda-har, 2021. URL https://dx.doi.org/10.21227/ 4epb-pg26

work page 2021
[77]

Openpack: A large- scale dataset for recognizing packaging works in iot-enabled logistic environments

Naoya Yoshimura, Jaime Morales, Takuya Maekawa, and Takahiro Hara. Openpack: A large- scale dataset for recognizing packaging works in iot-enabled logistic environments. In2024 IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 90–97, 2024. doi: 10.1109/PerCom59722.2024.10494448

work page doi:10.1109/percom59722.2024.10494448 2024
[78]

Cafe: Unifying representation and generation with contrastive-autoregressive finetuning

Hao Yu, Zhuokai Zhao, Shen Yan, Lukasz Korycki, Jianyu Wang, Baosheng He, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, and Hanchao Yu. Cafe: Unifying representation and generation with contrastive-autoregressive finetuning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6286–6297, 2025

work page 2025
[79]

MoPFormer: Motion- primitive transformer for wearable-sensor activity recognition

Hao Zhang, Zhan Zhuang, Xuehao Wang, Xiaodong Yang, and Yu Zhang. MoPFormer: Motion- primitive transformer for wearable-sensor activity recognition. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview. net/forum?id=Ty9n72fZ1K

work page 2025
[80]

Usc-had: A daily activity dataset for ubiquitous activity recognition using wearable sensors

Mi Zhang and Alexander A Sawchuk. Usc-had: A daily activity dataset for ubiquitous activity recognition using wearable sensors. InProceedings of the 2012 ACM conference on ubiquitous computing, pages 1036–1043, 2012

work page 2012

Showing first 80 references.

[1] [1]

Aggarwal and M.S

J.K. Aggarwal and M.S. Ryoo. Human activity analysis: A review.ACM Comput. Surv., 43(3), April 2011. ISSN 0360-0300. doi: 10.1145/1922649.1922653. 10

work page doi:10.1145/1922649.1922653 2011

[2] [2]

Comparative study on classifying human activities with miniature inertial and magnetic sensors.Pattern Recognition, 43(10):3605–3620, 2010

Kerem Altun, Billur Barshan, and Orkun Tunçel. Comparative study on classifying human activities with miniature inertial and magnetic sensors.Pattern Recognition, 43(10):3605–3620, 2010

work page 2010

[3] [3]

A public domain dataset for human activity recognition using smartphones

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz, et al. A public domain dataset for human activity recognition using smartphones. InEsann, volume 3, pages 3–4, 2013

work page 2013

[4] [4]

Llasa: A sensor-aware llm for natural language reasoning of human activity from imu data

Sheikh Asif Imran Shouborno, Mohammad Nur Hossain Khan, Subrata Biswas, and Bashima Islam. Llasa: A sensor-aware llm for natural language reasoning of human activity from imu data. InCompanion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp Companion ’25, page 893–899, New York, NY , USA, 2025. Association for...

work page doi:10.1145/3714394.3756187 2025

[5] [5]

w-har: An activity recognition dataset and framework using low-power wearable devices.Sensors, 20(18):5356, 2020

Ganapati Bhat, Nicholas Tran, Holly Shill, and Umit Y Ogras. w-har: An activity recognition dataset and framework using low-power wearable devices.Sensors, 20(18):5356, 2020

work page 2020

[6] [6]

Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook

Sizhen Bian, Mengxi Liu, Siyu Yuan, Lala Shakti Swarup Ray, Bo Zhou, Bin Guo, Zhiwen Yu, Thomas Ploetz, Paul Lukowicz, and Vitor Fortes Rey. Foundation models defining a new era in sensor-based human activity recognition: A survey and outlook.arXiv preprint arXiv:2604.02711, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[7] [7]

A tutorial on human activity recognition using body-worn inertial sensors.ACM Comput

Andreas Bulling, Ulf Blanke, and Bernt Schiele. A tutorial on human activity recognition using body-worn inertial sensors.ACM Comput. Surv., 46(3), January 2014. ISSN 0360-0300. doi: 10.1145/2499621

work page doi:10.1145/2499621 2014

[8] [8]

Towards generalizable human activity recognition: A survey.arXiv preprint arXiv:2508.12213, 2025

Yize Cai, Baoshen Guo, Flora Salim, and Zhiqing Hong. Towards generalizable human activity recognition: A survey.arXiv preprint arXiv:2508.12213, 2025

work page arXiv 2025

[9] [9]

A systematic study of unsupervised domain adaptation for robust human-activity recognition.Proc

Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. A systematic study of unsupervised domain adaptation for robust human-activity recognition.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 4(1), March 2020. doi: 10.1145/3380985

work page doi:10.1145/3380985 2020

[10] [10]

COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition

Baiyu Chen, Wilson Wongso, Zechen Li, Yonchanok Khaokaew, Hao Xue, and Flora Salim. Co- modo: Cross-modal video-to-imu distillation for efficient egocentric human activity recognition. arXiv preprint arXiv:2503.07259, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor

Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In2015 IEEE International conference on image processing (ICIP), pages 168–172. IEEE, 2015

work page 2015

[12] [12]

Understanding and using context.Personal and ubiquitous computing, 5(1):4–7, 2001

Anind K Dey. Understanding and using context.Personal and ubiquitous computing, 5(1):4–7, 2001

work page 2001

[13] [13]

Mantisv2: Closing the zero-shot gap in time series classification with synthetic data and test-time strategies

Vasilii Feofanov, Songkang Wen, Jianfeng Zhang, Lujia Pan, and Ievgen Redko. Mantisv2: Closing the zero-shot gap in time series classification with synthetic data and test-time strategies. arXiv preprint arXiv:2602.17868, 2026

work page arXiv 2026

[14] [14]

Learning from the best: Contrastive representations learning across sensor locations for wearable activity recognition

Vitor Fortes Rey, Sungho Suh, and Paul Lukowicz. Learning from the best: Contrastive representations learning across sensor locations for wearable activity recognition. ISWC ’22, page 28–32, New York, NY , USA, 2022. Association for Computing Machinery. ISBN 9781450394246. doi: 10.1145/3544794.3558464

work page doi:10.1145/3544794.3558464 2022

[15] [15]

Scaling deep contrastive learning batch size under memory limited setup

Luyu Gao, Yunyi Zhang, Jiawei Han, and Jamie Callan. Scaling deep contrastive learning batch size under memory limited setup. In Anna Rogers, Iacer Calixto, Ivan Vuli´c, Naomi Saphra, Nora Kassner, Oana-Maria Camburu, Trapit Bansal, and Vered Shwartz, editors,Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 316–321...

work page doi:10.18653/v1/2021.repl4nlp-1.31 2021

[16] [16]

ImageBind: One embedding space to bind them all

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. ImageBind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180–15190, 2023. 11

work page 2023

[17] [17]

Gemma 4 Model Card

Google DeepMind. Gemma 4 Model Card. Technical report, April 2026. Last updated: 2026-04-17

work page 2026

[18] [18]

Ego4D: Around the world in 3,000 hours of egocentric video

Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, et al. Ego4D: Around the world in 3,000 hours of egocentric video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18995–19012, 2022

work page 2022

[19] [19]

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyl- los Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, et al. Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1...

work page 2024

[20] [20]

Past, present, and future of sensor-based human activity recognition using wearables: A surveying tutorial on a still challenging task.Proc

Harish Haresamudram, Chi Ian Tang, Sungho Suh, Paul Lukowicz, and Thomas Plötz. Past, present, and future of sensor-based human activity recognition using wearables: A surveying tutorial on a still challenging task.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 9 (2), June 2025. doi: 10.1145/3729467

work page doi:10.1145/3729467 2025

[21] [21]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[22] [22]

{DEBERTA}: {DECODING}- {enhanced} {bert} {with} {disentangled} {attention}

Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. {DEBERTA}: {DECODING}- {enhanced} {bert} {with} {disentangled} {attention}. InInternational Conference on Learning Representations, 2021. URLhttps://openreview.net/forum?id=XPZIaotutsD

work page 2021

[23] [23]

Egolm: Multi-modal language model of egocentric motions

Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim, Yuting Ye, Richard Newcombe, Ziwei Liu, and Lingni Ma. Egolm: Multi-modal language model of egocentric motions. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5344–5354, 2025

work page 2025

[24] [24]

Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining.Proc

Zhiqing Hong, Zelong Li, Shuxin Zhong, Wenjun Lyu, Haotian Wang, Yi Ding, Tian He, and Desheng Zhang. Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 8(2), May 2024. doi: 10.1145/3659597

work page doi:10.1145/3659597 2024

[25] [25]

Graphmae2: A decoding-enhanced masked self-supervised graph learner

Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, and Jie Tang. Graphmae2: A decoding-enhanced masked self-supervised graph learner. InProceedings of the ACM web conference 2023, pages 737–746, 2023

work page 2023

[26] [26]

Graph contrastive learning for skeleton-based action recognition

Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, and Bin Feng. Graph contrastive learning for skeleton-based action recognition. InThe Eleventh International Conference on Learning Representations,

work page

[27] [27]

URLhttps://openreview.net/forum?id=PLUXnnxUdr4

work page

[28] [28]

Sijie Ji, Xinzhe Zheng, and Chenshu Wu. HARGPT: Are LLMs zero-shot human activity recognizers? In2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys), pages 38–43, 2024. doi: 10.1109/FMSys62467.2024. 00011

work page doi:10.1109/fmsys62467.2024 2024

[29] [29]

Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36:20067–20079, 2023

Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, and Tao Chen. Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36:20067–20079, 2023

work page 2023

[30] [30]

Motionbind: Multi-modal human motion alignment for re- trieval, recognition, and generation

Kaleab A Kinfu and Rene Vidal. Motionbind: Multi-modal human motion alignment for re- trieval, recognition, and generation. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems, 2025. URLhttps://openreview.net/forum?id=sUjwDdyspc

work page 2025

[31] [31]

Enabling sustainability and energy awareness in schools based on iot and real-world data,

Kai Kunze and Paul Lukowicz. Sensor placement variations in wearable activity recognition. IEEE Pervasive Computing, 13(4):32–41, 2014. doi: 10.1109/MPRV .2014.73. 12

work page doi:10.1109/mprv 2014

[32] [32]

NV-embed: Improved techniques for training LLMs as generalist embedding models

Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. NV-embed: Improved techniques for training LLMs as generalist embedding models. InThe Thirteenth International Conference on Learning Representations,

work page

[33] [33]

URLhttps://openreview.net/forum?id=lgsyLSsDRe

work page

[34] [34]

Generating virtual on-body accelerometer data from virtual textual descriptions for human activity recognition

Zikang Leng, Hyeokhyen Kwon, and Thomas Ploetz. Generating virtual on-body accelerometer data from virtual textual descriptions for human activity recognition. InProceedings of the 2023 ACM International Symposium on Wearable Computers, ISWC ’23, page 39–43, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400701993. doi: 10.1145/35...

work page doi:10.1145/3594738.3611361 2023

[35] [35]

Imugpt 2.0: Language-based cross modality transfer for sensor-based human activity recognition.Proc

Zikang Leng, Amitrajit Bhattacharjee, Hrudhai Rajasekhar, Lizhe Zhang, Elizabeth Bruda, Hyeokhyen Kwon, and Thomas Plötz. Imugpt 2.0: Language-based cross modality transfer for sensor-based human activity recognition.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 8(3), September 2024. doi: 10.1145/3678545

work page doi:10.1145/3678545 2024

[36] [36]

3d human action representation learning via cross-view consistency pursuit

Linguo Li, Minsi Wang, Bingbing Ni, Hang Wang, Jiancheng Yang, and Wenjun Zhang. 3d human action representation learning via cross-view consistency pursuit. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4741–4750, 2021

work page 2021

[37] [37]

ZARA: Training-Free Motion Time-Series Reasoning via Evidence-Grounded LLM Agents

Zechen Li, Baiyu Chen, Hao Xue, and Flora D Salim. Zara: Training-free motion time-series reasoning via evidence-grounded llm agents.arXiv preprint arXiv:2508.04038, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[38] [38]

Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, and Flora D. Salim. SensorLLM: Aligning large language models with motion sensors for human activity recognition. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, edi- tors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Pro- cessing, pages 354–...

work page doi:10.18653/v1/2025.emnlp-main.19 2025

[39] [39]

Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition

Lilang Lin, Jiahang Zhang, and Jiaying Liu. Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2363–2372, 2023

work page 2023

[40] [40]

Gpt understands, too.AI open, 5:208–215, 2024

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. Gpt understands, too.AI open, 5:208–215, 2024

work page 2024

[41] [41]

Toward foundation model for multivariate wearable sensing of physiological signals.ACM Trans

Yunfei Luo, Yuliang Chen, Asif Salekin, and Tauhidur Rahman. Toward foundation model for multivariate wearable sensing of physiological signals.ACM Trans. Comput. Healthcare, March 2026. doi: 10.1145/3803808

work page doi:10.1145/3803808 2026

[42] [42]

Nymeria: A massive collection of multimodal egocentric daily motion in the wild

Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo Jin Kim, et al. Nymeria: A massive collection of multimodal egocentric daily motion in the wild. InEuropean Conference on Computer Vision, pages 445–465. Springer, 2024

work page 2024

[43] [43]

Masked motion predictors are strong 3d action representation learners

Yunyao Mao, Jiajun Deng, Wengang Zhou, Yao Fang, Wanli Ouyang, and Houqiang Li. Masked motion predictors are strong 3d action representation learners. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10181–10191, 2023

work page 2023

[44] [44]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018. URL https: //arxiv.org/abs/1802.03426

work page internal anchor Pith review Pith/arXiv arXiv 2018

[45] [45]

Wonderwall: A virtual-to-real foundation model for imu- based har.Proc

Shenghuan Miao and Ling Chen. Wonderwall: A virtual-to-real foundation model for imu- based har.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 10(1), March 2026. doi: 10.1145/3789688

work page doi:10.1145/3789688 2026

[46] [46]

IMU2CLIP: Language-grounded motion sensor translation with multimodal con- trastive learning

Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Aparajita Saraf, Amy Bearman, and Babak Damavandi. IMU2CLIP: Language-grounded motion sensor translation with multimodal con- trastive learning. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Associ- ation for Computational Linguistics: EMNLP 2023, pages 13246–13253, Singapore, December 13

work page 2023

[47] [47]

doi: 10.18653/v1/2023.findings-emnlp.883

Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.883. URLhttps://aclanthology.org/2023.findings-emnlp.883/

work page doi:10.18653/v1/2023.findings-emnlp.883 2023

[48] [48]

Generative representational instruction tuning

Niklas Muennighoff, Hongjin SU, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, and Douwe Kiela. Generative representational instruction tuning. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview. net/forum?id=BC4lIvfSzv

work page 2025

[49] [49]

Mobind: Motion binding for fine-grained imu-video pose alignment.arXiv preprint arXiv:2602.19004, 2026

Duc Duy Nguyen, Tat-Jun Chin, and Minh Hoai. Mobind: Motion binding for fine-grained imu-video pose alignment.arXiv preprint arXiv:2602.19004, 2026

work page arXiv 2026

[50] [50]

Wimusim: simulat- ing realistic variabilities in wearable imus for human activity recognition.Frontiers in Computer Science, V olume 7 - 2025, 2025

Nobuyuki Oishi, Phil Birch, Daniel Roggen, and Paula Lago. Wimusim: simulat- ing realistic variabilities in wearable imus for human activity recognition.Frontiers in Computer Science, V olume 7 - 2025, 2025. ISSN 2624-9898. doi: 10.3389/fcomp. 2025.1514933. URL https://www.frontiersin.org/journals/computer-science/ articles/10.3389/fcomp.2025.1514933

work page doi:10.3389/fcomp 2025

[51] [51]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI. gpt-oss-120b & gpt-oss-20b Model Card, 2025. URL https://arxiv.org/abs/ 2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025

[52] [52]

GPT-5.4 Thinking System Card

OpenAI. GPT-5.4 Thinking System Card. Technical report, OpenAI, March 2026. URL https: //deploymentsafety.openai.com/gpt-5-4-thinking/gpt-5-4-thinking.pdf . Sys- tem card

work page 2026

[53] [53]

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Francisco Javier Ordóñez and Daniel Roggen. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition.Sensors, 16(1), 2016. ISSN 1424-8220. doi: 10.3390/s16010115. URLhttps://www.mdpi.com/1424-8220/16/1/115

work page doi:10.3390/s16010115 2016

[54] [54]

Qwen2.5 Technical Report

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[55] [55]

W2w: A simulated exploration of imu placement across the human body for designing smarter wearable

Lala Shakti Swarup Ray, Bo Zhou, and Paul Lukowicz. W2w: A simulated exploration of imu placement across the human body for designing smarter wearable. InProceedings of the 2025 ACM International Symposium on Wearable Computers, ISWC ’25, page 170–176, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400714818. doi: 10.1145/3715071.3750417

work page doi:10.1145/3715071.3750417 2025

[56] [56]

Introducing a new benchmarked dataset for activity monitoring

Attila Reiss and Didier Stricker. Introducing a new benchmarked dataset for activity monitoring. In2012 16th international symposium on wearable computers, pages 108–109. IEEE, 2012

work page 2012

[57] [57]

Collecting complex activity datasets in highly rich networked sensor environments

Daniel Roggen, Alberto Calatroni, Mirco Rossi, Thomas Holleczek, Kilian Förster, Gerhard Tröster, Paul Lukowicz, David Bannach, Gerald Pirkl, Alois Ferscha, et al. Collecting complex activity datasets in highly rich networked sensor environments. In2010 Seventh international conference on networked sensing systems (INSS), pages 233–240. IEEE, 2010

work page 2010

[58] [58]

Human motion.Understanding, Modeling, Capture, 2008

Bodo Rosenhahn, Reinhard Klette, and Dimitris Metaxas. Human motion.Understanding, Modeling, Capture, 2008

work page 2008

[59] [59]

Context-aware computing applications

Bill Schilit, Norman Adams, and Roy Want. Context-aware computing applications. In1994 first workshop on mobile computing systems and applications, pages 85–90. IEEE, 1994

work page 1994

[60] [60]

InProceedings of the 13th ACM Conference on Embedded Networked Sensor Systems(Seoul, South Korea)(Sen- Sys ’15)

Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjær- gaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. Smart devices are different: Assess- ing and mitigatingmobile sensing heterogeneities for activity recognition. InProceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, SenSys ’15, page 127–...

work page doi:10.1145/2809695.2809718 2015

[61] [61]

A systematic review of smartphone-based human activity recognition methods for health research.npj Digital Medicine, 4(1):148, 2021

Marcin Straczkiewicz, Peter James, and Jukka-Pekka Onnela. A systematic review of smartphone-based human activity recognition methods for health research.npj Digital Medicine, 4(1):148, 2021. ISSN 2398-6352. doi: 10.1038/s41746-021-00514-4. URL https://doi.org/10.1038/s41746-021-00514-4

work page doi:10.1038/s41746-021-00514-4 2021

[62] [62]

Imuzero: Zero-shot human activity recognition by language-based cross modality fusion.Proc

Jie Su, Fengtong Ge, Zhenyu Wen, Taotao Li, Yang Bai, Yejian Zhou, and Xiaoqin Zhang. Imuzero: Zero-shot human activity recognition by language-based cross modality fusion.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 9(4), December 2025. doi: 10.1145/ 3770669

work page 2025

[63] [63]

On-body localization of wearable devices: An investigation of position-aware activity recognition

Timo Sztyler and Heiner Stuckenschmidt. On-body localization of wearable devices: An investigation of position-aware activity recognition. In2016 IEEE international conference on pervasive computing and communications (PerCom), pages 1–9. IEEE, 2016

work page 2016

[64] [64]

On-body localization of wearable devices: An investigation of position-aware activity recognition

Timo Sztyler and Heiner Stuckenschmidt. On-body localization of wearable devices: An investigation of position-aware activity recognition. In2016 IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 1–9, 2016. doi: 10.1109/ PERCOM.2016.7456521

work page arXiv 2016

[65] [65]

Yue Tan, Xiaoqian Hu, Hao Xue, Celso M de Melo, and Flora D. Salim. Bisecle: Binding and separation in continual learning for video language understanding. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview. net/forum?id=o6keqobP13

work page 2025

[66] [66]

Catherine Tong, Jinchen Ge, and Nicholas D. Lane. Zero-shot learning for imu-based activity recognition using video embeddings.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 5(4), December 2022. doi: 10.1145/3494995

work page doi:10.1145/3494995 2022

[67] [67]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[68] [68]

Ubiphysio: Support daily functioning, fitness, and rehabilitation with action understanding and feedback in natural language.Proc

Chongyang Wang, Yuan Feng, Lingxiao Zhong, Siyi Zhu, Chi Zhang, Siqi Zheng, Chen Liang, Yuntao Wang, Chengqi He, Chun Yu, and Yuanchun Shi. Ubiphysio: Support daily functioning, fitness, and rehabilitation with action understanding and feedback in natural language.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 8(1), March 2024. doi: 10.1145/3643552

work page doi:10.1145/3643552 2024

[69] [69]

Ego4o: Egocentric human motion capture and understanding from multi-modal input

Jian Wang, Rishabh Dabral, Diogo Luvizon, Zhe Cao, Lingjie Liu, Thabo Beeler, and Christian Theobalt. Ego4o: Egocentric human motion capture and understanding from multi-modal input. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22668–22679, 2025

work page 2025

[70] [70]

One model to fit them all: Universal imu-based human activity recognition with llm-assisted cross-dataset representation.Proc

Qingxin Wei, Jiaming Huang, Yi Gao, and Wei Dong. One model to fit them all: Universal imu-based human activity recognition with llm-assisted cross-dataset representation.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 9(3), September 2025. doi: 10.1145/3749509

work page doi:10.1145/3749509 2025

[71] [71]

Gary M Weiss. Wisdm smartphone and smartwatch activity and biometrics dataset.UCI Machine Learning Repository: WISDM Smartphone and Smartwatch Activity and Biometrics Dataset Data Set, 7(133190-133202):5, 2019

work page 2019

[72] [72]

Towards continual egocentric activity recognition: A multi-modal egocentric activity dataset for continual learning.IEEE Transactions on Multimedia, 26: 2430–2443, 2024

Linfeng Xu, Qingbo Wu, Lili Pan, Fanman Meng, Hongliang Li, Chiyuan He, Hanxin Wang, Shaoxu Cheng, and Yu Dai. Towards continual egocentric activity recognition: A multi-modal egocentric activity dataset for continual learning.IEEE Transactions on Multimedia, 26: 2430–2443, 2024. doi: 10.1109/TMM.2023.3295899

work page doi:10.1109/tmm.2023.3295899 2024

[73] [73]

Relcon: Relative contrastive learning for a motion foundation model for wearable data

Maxwell A Xu, Jaya Narain, Gregory Darnell, Haraldur T Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Andres Fineman, Karthik Jayaraman Raghuram, James Matthew Rehg, and Shirley You Ren. Relcon: Relative contrastive learning for a motion foundation model for wearable data. InThe Thirteenth International Conference on Learning Representations, 2025. URL...

work page 2025

[74] [74]

Skeletonmae: graph- based masked autoencoder for skeleton sequence pre-training

Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, and Liang Lin. Skeletonmae: graph- based masked autoencoder for skeleton sequence pre-training. InProceedings of the IEEE/CVF international conference on computer vision, pages 5606–5618, 2023. 15

work page 2023

[75] [75]

Spatial temporal graph convolutional networks for skeleton-based action recognition

Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018

[76] [76]

Tnda-har, 2021

Yan Yan, Dali Chen, Yushi Liu, Jinjin Zhao, Bo Wang, Xuankun Wu, Xiaohao Jiao, Yuqian Chen, Huihui Li, and Xuchao Ren. Tnda-har, 2021. URL https://dx.doi.org/10.21227/ 4epb-pg26

work page 2021

[77] [77]

Openpack: A large- scale dataset for recognizing packaging works in iot-enabled logistic environments

Naoya Yoshimura, Jaime Morales, Takuya Maekawa, and Takahiro Hara. Openpack: A large- scale dataset for recognizing packaging works in iot-enabled logistic environments. In2024 IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 90–97, 2024. doi: 10.1109/PerCom59722.2024.10494448

work page doi:10.1109/percom59722.2024.10494448 2024

[78] [78]

Cafe: Unifying representation and generation with contrastive-autoregressive finetuning

Hao Yu, Zhuokai Zhao, Shen Yan, Lukasz Korycki, Jianyu Wang, Baosheng He, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, and Hanchao Yu. Cafe: Unifying representation and generation with contrastive-autoregressive finetuning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6286–6297, 2025

work page 2025

[79] [79]

MoPFormer: Motion- primitive transformer for wearable-sensor activity recognition

Hao Zhang, Zhan Zhuang, Xuehao Wang, Xiaodong Yang, and Yu Zhang. MoPFormer: Motion- primitive transformer for wearable-sensor activity recognition. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview. net/forum?id=Ty9n72fZ1K

work page 2025

[80] [80]

Usc-had: A daily activity dataset for ubiquitous activity recognition using wearable sensors

Mi Zhang and Alexander A Sawchuk. Usc-had: A daily activity dataset for ubiquitous activity recognition using wearable sensors. InProceedings of the 2012 ACM conference on ubiquitous computing, pages 1036–1043, 2012

work page 2012