pith. machine review for the scientific record. sign in

arxiv: 2605.04791 · v2 · submitted 2026-05-06 · 💻 cs.HC

Recognition: unknown

OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:25 UTC · model grok-4.3

classification 💻 cs.HC
keywords smartwatchgesture recognitionmultimodal benchmarkIMUPPGmixture of expertsfoundation model adaptationwearable sensing
0
0 comments X

The pith

A specialized smartwatch model outperforms adapted foundation models for hand gesture recognition while using far fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OpenWatch as the first open multimodal benchmark for wrist-based hand gesture recognition, collecting over 10 hours of synchronized IMU and PPG data from 50 participants across a vocabulary of 59 gestures. It develops MixToken, a task-specific architecture, and NormWear-Lora for adapting foundation models, then evaluates both under a subject-independent protocol. Results show that MixToken reaches substantially higher accuracy with far smaller size than the adapted foundation models, and that PPG signals add meaningful predictive value especially to the larger models. This matters because smartwatches are already common yet lack reliable public tools for developing accurate on-device gesture controls that respect battery and privacy limits.

Core claim

The authors establish that task-specific architectures substantially outperform finetuned smartwatch foundation models on the OpenWatch benchmark in both accuracy and memory efficiency, while PPG signals carry substantial predictive benefit for the foundation models.

What carries the argument

MixToken, a task-specific mixture-of-experts that fuses per-channel IMU filterbank features with cross-channel statistical tokens through learned logit mixing.

If this is right

  • Gesture recognition on resource-constrained wearables can reach high accuracy without large models.
  • PPG signals should be included when adapting foundation models to wearable sensing tasks.
  • Specialized architecture design offers better trade-offs than foundation-model adaptation for this domain.
  • The open benchmark enables direct comparisons of future methods for multimodal wrist sensing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • For other narrow wearable tasks, custom lightweight models may routinely outperform general foundation-model fine-tuning.
  • The benchmark could support testing of gesture vocabularies that feel more natural in daily watch use.
  • Releasing the data invites extensions that add real-world noise such as varying arm positions or skin tones.

Load-bearing premise

The 50 participants, 59 gestures, and subject-independent data splits are representative of real-world use without introducing biases or leakage that would exaggerate the performance gaps.

What would settle it

Collecting a new dataset from different users performing the same 59 gestures and finding that MixToken no longer shows clear accuracy or parameter-efficiency gains over the fine-tuned foundation models.

Figures

Figures reproduced from arXiv: 2605.04791 by Andrea Ronco, Daniel Eckert, Dengxin Dai, Junjie Zeng, Michele Magno, Pietro Bonazzi, Youssef Ahmed.

Figure 1
Figure 1. Figure 1: Overall pipeline: dataset, preprocessing and augmentation, compared model families, view at source ↗
Figure 2
Figure 2. Figure 2: The five benchmark gestures: double clench, double pinch, pinch down, pinch up, and slide. view at source ↗
Figure 3
Figure 3. Figure 3: Clip-level macro-F1 comparison across all models: window-level without augmentation, view at source ↗
Figure 4
Figure 4. Figure 4: Fusion-weight dynamics for Mix-Token, trained with and without data augmentation. Fusion weight dynamics. As shown in view at source ↗
Figure 5
Figure 5. Figure 5: Mix-Token confusion matrices (test set, with augmentation): window-level row-normalized view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE visualization of clip-level embeddings (test set). Left: NormWear-Base (frozen view at source ↗
Figure 7
Figure 7. Figure 7: Effect of PPG on NormWear-Base (no augmentation). Adding PPG yields a modest window-level gain but a 3.6× larger gain at clip level, consistent with a temporally stabilizing role.. Adding PPG improves window-level macro-F1 by +0.035 and clip-level by +0.125 (3.6× ra￾tio), suggesting that PPG provides a useful sig￾nal to predict gestures in wearable foundation models. This is consistent with prior findings … view at source ↗
Figure 8
Figure 8. Figure 8: Data collection interface. Top: gesture selection, instruction display with countdown view at source ↗
Figure 9
Figure 9. Figure 9: Subjective evaluation of gesture quality. Top: mean usability scores per gesture. Bottom: view at source ↗
Figure 10
Figure 10. Figure 10: Feature group masking ablation for the Mix-Token (with augmentation). Each group view at source ↗
Figure 11
Figure 11. Figure 11: UCI-HAR analysis. Top: linear-probe performance before and after LoRA. Bottom: view at source ↗
Figure 12
Figure 12. Figure 12: NormWear-LoRA confusion matrices (test set, with augmentation): window-level row view at source ↗
read the original abstract

Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we introduce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a commercial smartwatch. It contains over 10 hours of Inertial Measurement Unit (IMU) and Photoplethysmography (PPG) data across 50 participants and a vocabulary of 59 labelled gesture sequences. Furthermore, we present a subject-independent evaluation protocol including traditional and deep learning methods for time-series classification. On top of this, we develop two novel methodologies for hand-gesture recognition: (i) MixToken, a task-specific mixture-of-experts that fuses per-channel IMU filterbank features with cross-channel statistical tokens through learned logit mixing, and (ii) NormWear-Lora, a low-rank adaptation module for smartwatch foundation models. Our benchmarking results reveal that PPG signals carries a substantial predictive benefit (+12.5% F1-score) for foundational smartwatch models. In addition, we show that task-specific architectures (i.e. MixToken) substantially outperforms finetuned smartwatch foundation models in terms of accuracy (F1-score=90% vs 66%) and memory efficiency (223k vs 136M parameters). Finally, we also provide clear empirical guidance on the trade-offs between specialized architecture design, modality fusion, data augmentations, and foundation-model adaptation for resource-constrained wearable sensing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This paper introduces OpenWatch, the first open-access multimodal benchmark for hand gesture recognition on smartwatches, featuring over 10 hours of synchronized IMU and PPG data from 50 participants performing 59 gestures. It proposes MixToken, a task-specific mixture-of-experts architecture that fuses IMU filterbank features with statistical tokens, and NormWear-Lora, a low-rank adaptation for smartwatch foundation models. The benchmarking results demonstrate that MixToken achieves an F1-score of 90% with 223k parameters, outperforming finetuned foundation models at 66% F1 with 136M parameters, and that PPG signals provide a +12.5% F1-score improvement for foundation models. The work also provides empirical guidance on architecture design, modality fusion, and data augmentations for wearable sensing.

Significance. If the central claims hold under the subject-independent protocol, this manuscript offers a valuable contribution to the field of wearable computing and HCI by establishing an open benchmark that addresses the scarcity of multimodal datasets for smartwatch gesture recognition. The novel architectures and the demonstrated advantages of task-specific models over large foundation models in terms of accuracy and efficiency, as well as the predictive benefit of PPG, provide actionable insights for developing practical systems on resource-limited devices. The open release of the dataset and code would further enhance its impact by enabling reproducible research.

major comments (2)
  1. [Evaluation Protocol] Evaluation Protocol section: The subject-independent evaluation protocol is load-bearing for the performance claims (90% vs 66% F1 and +12.5% PPG benefit). The manuscript must explicitly detail the participant split methodology, including how the 50 users are divided into train/validation/test sets, any stratification criteria, and confirmation of no user overlap or leakage, as any bias here would undermine the generalizability of the architectural comparisons.
  2. [Results section (performance tables)] Results section (performance tables): The reported performance numbers, including the exact F1 scores and parameter counts, should include details on data augmentation procedures, hyperparameter selection, and statistical significance testing (e.g., p-values or confidence intervals across multiple runs) to substantiate that the gains are not artifacts of the experimental setup.
minor comments (2)
  1. [Abstract] The abstract mentions 'traditional and deep learning methods' but does not specify which ones are included in the benchmark; adding this would improve clarity.
  2. Ensure all figures have clear captions describing the axes and what is being compared, particularly for the memory efficiency vs accuracy trade-off plots.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments below and will make the necessary revisions to improve the clarity and rigor of the evaluation protocol and results sections.

read point-by-point responses
  1. Referee: [Evaluation Protocol] Evaluation Protocol section: The subject-independent evaluation protocol is load-bearing for the performance claims (90% vs 66% F1 and +12.5% PPG benefit). The manuscript must explicitly detail the participant split methodology, including how the 50 users are divided into train/validation/test sets, any stratification criteria, and confirmation of no user overlap or leakage, as any bias here would undermine the generalizability of the architectural comparisons.

    Authors: We agree that explicit details on the subject-independent split are necessary to support the generalizability of our claims. In the revised manuscript, we will add a detailed description in the Evaluation Protocol section outlining the participant division methodology, including the specific allocation of the 50 users to train, validation, and test sets, the random assignment process at the user level to prevent any overlap or leakage, and any criteria used for ensuring balanced representation of gestures across splits. This will allow readers to fully reproduce and assess the protocol. revision: yes

  2. Referee: [Results section (performance tables)] Results section (performance tables): The reported performance numbers, including the exact F1 scores and parameter counts, should include details on data augmentation procedures, hyperparameter selection, and statistical significance testing (e.g., p-values or confidence intervals across multiple runs) to substantiate that the gains are not artifacts of the experimental setup.

    Authors: We acknowledge the importance of providing full experimental details to validate the performance differences. In the revised manuscript, we will include additional information in the Results section on the data augmentation procedures applied during training, the method used for hyperparameter selection (such as validation-based tuning), and statistical significance testing including confidence intervals or p-values computed over multiple independent runs. These additions will help confirm that the observed improvements, such as the F1-score gains and efficiency advantages, are reliable and not due to specific experimental choices. revision: yes

Circularity Check

0 steps flagged

No circularity: new benchmark and empirical results are self-contained.

full rationale

The paper introduces an original multimodal dataset (OpenWatch) with 50 participants, 59 gestures, and synchronized IMU+PPG recordings, then reports standard classification metrics on subject-independent splits using both baseline methods and two new architectures (MixToken, NormWear-Lora). No equations, fitted parameters, or self-citations are shown that reduce the reported F1 scores, parameter counts, or modality deltas to quantities defined by the paper's own inputs or prior author work. The performance claims rest on direct experimentation against external benchmarks rather than any self-definitional or renaming step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claims rest on standard supervised learning assumptions for time-series data and the representativeness of the collected gestures; no new physical entities are postulated.

axioms (1)
  • domain assumption Subject-independent splits produce unbiased estimates of generalization performance for gesture recognition models.
    Invoked in the evaluation protocol description.
invented entities (2)
  • MixToken no independent evidence
    purpose: Task-specific mixture-of-experts fusing per-channel IMU filterbank features with cross-channel statistical tokens.
    New architecture introduced for this benchmark.
  • NormWear-Lora no independent evidence
    purpose: Low-rank adaptation module for smartwatch foundation models.
    New adaptation technique proposed in the paper.

pith-pipeline@v0.9.0 · 5587 in / 1341 out tokens · 42685 ms · 2026-05-08T16:25:45.658566+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 7 canonical work pages

  1. [1]

    Miller, Saba Emrani, Udhyakumar Nal- lasamy, and Ian Shapiro

    Salar Abbaspourazad, Oussama Elachqar, Andrew C. Miller, Saba Emrani, Udhyakumar Nal- lasamy, and Ian Shapiro. Large-scale Training of Foundation Models for Wearable Biosignals. International Conference on Learning Representations, 2024

  2. [2]

    Wearable accelerom- eter foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276, 2024

    Salar Abbaspourazad, Anshuman Mishra, Joseph Futoma, Andrew C. Miller, and Ian Shapiro. Wearable accelerometer foundation models for health via knowledge distillation.arXiv 2412.11276, 2025

  3. [3]

    Uwb-gestures, a public dataset of dynamic hand gestures acquired using impulse radar sensors.Scientific Data, 2021

    Shahzad Ahmed, Dingyang Wang, Junyoung Park, and Sung Ho Cho. Uwb-gestures, a public dataset of dynamic hand gestures acquired using impulse radar sensors.Scientific Data, 2021

  4. [4]

    A novel accelerometer-based gesture recognition system.IEEE Transactions on Signal Processing, 2011

    Ahmad Akl, Chen Feng, and Shahrokh Valaee. A novel accelerometer-based gesture recognition system.IEEE Transactions on Signal Processing, 2011

  5. [5]

    Althubiti and Haneen Algethami

    Asma H. Althubiti and Haneen Algethami. Dynamic gesture recognition using a transformer and mediapipe.International Journal of Advanced Computer Science and Applications, 2024. 16

  6. [6]

    Building the ninapro database: A resource for the biorobotics community.IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics, 2012

    Manfredo Atzori, Arjan Gijsberts, Simone Heynen, Anne-Gabrielle Mittaz Hager, Olivier Deriaz, Patrick Van Der Smagt, Claudio Castellini, Barbara Caputo, and Henning Muller. Building the ninapro database: A resource for the biorobotics community.IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics, 2012

  7. [7]

    A review on wearable photoplethysmography sensors and their potential future applications.Sensors, 2018

    David Castaneda et al. A review on wearable photoplethysmography sensors and their potential future applications.Sensors, 2018

  8. [8]

    Skeleton-Based Dynamic Hand Gesture Recognition.Conference on Computer Vision and Pattern Recognition Workshops, 2016

    Quentin De Smedt, Hazem Wannous, and Jean-Philippe Vandeborre. Skeleton-Based Dynamic Hand Gesture Recognition.Conference on Computer Vision and Pattern Recognition Workshops, 2016

  9. [9]

    Smartwatch statistics (2026): Global users & market share, April 2026

    DemandSage. Smartwatch statistics (2026): Global users & market share, April 2026. URL https://www.demandsage.com/smartwatch-statistics/. Accessed: 2026

  10. [10]

    Schmidt, and Geoffrey I

    Angus Dempster, Daniel F. Schmidt, and Geoffrey I. Webb. HYDRA: Competing convolutional kernels for fast and accurate time series classification.Data Mining and Knowledge Discovery, 2022

  11. [11]

    Closely Packed Stretchable Ultrasound Array Fabricated with Surface Charge Engineering for Contactless Gesture and Materials Detection.Advanced Science, 2024

    Ankan Dutta, Zhenyuan Niu, Abu Musa Abdullah, Naveen Tiwari, Md Abu Sayeed Biswas, Bowen Li, Farnaz Lorestani, Yun Jing, and Huanyu Cheng. Closely Packed Stretchable Ultrasound Array Fabricated with Surface Charge Engineering for Contactless Gesture and Materials Detection.Advanced Science, 2024

  12. [12]

    Laviola Jr., and Andrea Giachetti

    Marco Emporio, Amirpouya Ghasemaghaei, Joseph J. Laviola Jr., and Andrea Giachetti. Con- tinuous Hand Gesture Recognition: Benchmarks and Methods.Computer Vision and Image Understanding, 2025

  13. [13]

    Foti, and Joseph Futoma

    Eray Erturk, Fahad Kamran, Salar Abbaspourazad, Sean Jewell, Harsh Sharma, Yujie Li, Sinead Williamson, Nicholas J. Foti, and Joseph Futoma. Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions.International Conference on Machine Learning, 2025

  14. [14]

    Schmidt, Jonathan Weber, Geoffrey I

    Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F. Schmidt, Jonathan Weber, Geoffrey I. Webb, Lhassane Idoumghar, Pierre-Alain Muller, and François Petitjean. InceptionTime: Finding AlexNet for Time Series Classification.Data Mining and Knowledge Discovery, 2020

  15. [15]

    Machine Learning-Based Gesture Recognition Glove: Design and Implementation.Sensors, 2024

    Anna Filipowska, Wojciech Filipowski, Paweł Raif, Marcin Pieni ˛ a˙zek, Julia Bodak, Piotr Ferst, Kamil Pilarski, Szymon Siecinski, Rafał Jan Doniec, Julia Mieszczanin, Emilia Skwarek, Katarzyna Bryzik, Maciej Henkel, and Marcin Grzegorzek. Machine Learning-Based Gesture Recognition Glove: Design and Implementation.Sensors, 2024

  16. [16]

    Webb, Germain Forestier, and Mahsa Salehi

    Navid Mohammadi Foumani, Lynn Miller, Chang Wei Tan, Geoffrey I. Webb, Germain Forestier, and Mahsa Salehi. Deep Learning for Time Series Classification and Extrinsic Regression: A Current Survey.ACM Computing Surveys, 2024

  17. [17]

    ConvMixFormer: A Resource- efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition.arXiv 2411.07118, 2024

    Mallika Garg, Debashis Ghosh, and Pyari Mohan Pradhan. ConvMixFormer: A Resource- efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition.arXiv 2411.07118, 2024

  18. [18]

    Recognizing hand and finger gestures with imu based motion and emg based muscle activity sensing.International Conference on Bio-inspired Systems and Signal Processing, 2015

    Marcus Georgi, Christoph Amma, and Tanja Schultz. Recognizing hand and finger gestures with imu based motion and emg based muscle activity sensing.International Conference on Bio-inspired Systems and Signal Processing, 2015

  19. [19]

    P. R. I. Gomes et al. Gesture recognition methods using sensors integrated in smartwatches: A systematic literature review.ACM Conference on Human-Computer Interaction, 2023

  20. [20]

    Xiao Gu, Zhangdaihong Liu, Jinpei Han, Jianing Qiu, Wenfei Fang, Lei Lu, Lei Clifton, Yuan-Ting Zhang, and David A. Clifton. Transforming label-efficient decoding of healthcare wearables with self-supervised learning and domain expertise.Communications Engineering, 2025. 17

  21. [21]

    SpGesture: Source- Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network.Advances in Neural Information Processing Systems, 2024

    Weiyu Guo, Ying Sun, Yijie Xu, Ziyue Qiao, Yongkui Yang, and Hui Xiong. SpGesture: Source- Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network.Advances in Neural Information Processing Systems, 2024

  22. [22]

    Spatio-Temporal Transformer with Kolmogorov–Arnold Network for Skeleton-Based Hand Gesture Recognition

    Pengcheng Han, Xin He, Takafumi Matsumaru, and Vibekananda Dutta. Spatio-Temporal Transformer with Kolmogorov–Arnold Network for Skeleton-Based Hand Gesture Recognition. Sensors, 2025

  23. [23]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations, 2021

  24. [24]

    Beamband: Hand gesture sensing with ultrasonic beamforming.Conference on Human Factors in Computing Systems, 2019

    Yasha Iravantchi, Mayank Goel, and Chris Harrison. Beamband: Hand gesture sensing with ultrasonic beamforming.Conference on Human Factors in Computing Systems, 2019

  25. [25]

    Jacobs, Michael I

    Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 1991

  26. [26]

    Lstm fully convolu- tional networks for time series classification.IEEE Access, 2018

    Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel Chen. Lstm fully convolu- tional networks for time series classification.IEEE Access, 2018

  27. [27]

    Photoplethysmography in wearable devices: A comprehensive review of technological advances, current challenges, and future directions.Electronics, 2023

    Ki Byung Kim et al. Photoplethysmography in wearable devices: A comprehensive review of technological advances, current challenges, and future directions.Electronics, 2023

  28. [28]

    Imu sensor-based hand gesture recognition for human-machine interfaces.Sensors, 2019

    Minwoo Kim, Jaechan Cho, Seongjoo Lee, and Yunho Jung. Imu sensor-based hand gesture recognition for human-machine interfaces.Sensors, 2019

  29. [29]

    WaveGlove: Transformer-based hand gesture recognition using multiple inertial sensors.European Signal Processing Conference, 2021

    Matej Krälik and Marek Šuppa. WaveGlove: Transformer-based hand gesture recognition using multiple inertial sensors.European Signal Processing Conference, 2021

  30. [30]

    Robust and deployable gesture recognition for smartwatches.Interna- tional Conference on Intelligent User Interfaces, 2022

    Utkarsh Kunwar et al. Robust and deployable gesture recognition for smartwatches.Interna- tional Conference on Intelligent User Interfaces, 2022

  31. [31]

    Sensing fine-grained hand activity with smartwatches

    Gierad Laput and Chris Harrison. Sensing fine-grained hand activity with smartwatches. Conference on Human Factors in Computing Systems, 2019

  32. [32]

    Motion artifact reduction in wearable photoplethysmography based on multi-channel sensors with multiple wavelengths

    Jongshill Lee, Minseong Kim, Hoon-Ki Park, and In Young Kim. Motion artifact reduction in wearable photoplethysmography based on multi-channel sensors with multiple wavelengths. Sensors, 2020

  33. [33]

    uwave: Accelerometer- based personalized gesture recognition and its applications.Pervasive and Mobile Computing, 2009

    Jiayang Liu, Lin Zhong, Jehan Wickramasuriya, and Venu Vasudevan. uwave: Accelerometer- based personalized gesture recognition and its applications.Pervasive and Mobile Computing, 2009

  34. [34]

    & Rahman, T

    Yunfei Luo, Yuliang Chen, Asif Salekin, and Tauhidur Rahman. Toward foundation model for multivariate wearable sensing of physiological signals.arXiv, 2412.09758, 2025

  35. [35]

    McKenna and Kenny Morrison

    Stephen J. McKenna and Kenny Morrison. A comparison of skin history and trajectory-based representation schemes for the recognition of user-specified gestures.Pattern Recognition, 2004

  36. [36]

    Vit-hgr: Vision transformer-based hand gesture recognition from high density surface emg signals.arXiv, 2201.10060, 2022

    Mansooreh Montazerin, Soheil Zabihi, Elahe Rahimian, Arash Mohammadi, and Farnoosh Naderkhani. Vit-hgr: Vision transformer-based hand gesture recognition from high density surface emg signals.arXiv, 2201.10060, 2022

  37. [37]

    Narayanswamy, X

    Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff. Scaling Wearable Foundation Models.arXiv, 2410.13638, 2024

  38. [38]

    Interaction with smartwatches using gesture recognition: A systematic literature review

    Thamer Horbylon Nascimento, Cristiane BR Ferreira, Wellington G Rodrigues, and Fabrizzio Soares. Interaction with smartwatches using gesture recognition: A systematic literature review. IEEE Annual Computers, Software, and Applications Conference, 2020. 18

  39. [39]

    Papagei: Open foundation models for optical physiological signals.International Conference on Learning Representations, 2025

    Arvind Pillai, Dimitris Spathis, Fahim Kawsar, and Mohammad Malekzadeh. Papagei: Open foundation models for optical physiological signals.International Conference on Learning Representations, 2025

  40. [40]

    M. Qiu, C. Weng, M. Fan, and K. Wu. Towards customizable foundation models for human activity recognition with wearable devices.ACM Conference on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2025

  41. [41]

    Hand gesture recognition using sEMG signals with a multi-stream time-varying feature enhancement approach.Scientific Reports, 2024

    Jungpil Shin, Abu Saleh Musa Miah, Sota Konnai, Itsuki Takahashi, and Koki Hirooka. Hand gesture recognition using sEMG signals with a multi-stream time-varying feature enhancement approach.Scientific Reports, 2024

  42. [42]

    Um, Franz M

    Terry T. Um, Franz M. J. Pfister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, and Dana Kuliˇc. Data augmentation of wearable sensor data for Parkinson’s disease monitoring using convolutional neural networks.ACM International Conference on Multimodal Interaction, 2017

  43. [43]

    V ., Seung Ju Han, Kangil Kim, Patricio Rivera Lopez, Tae-Seong Kim, and Sangmin Lee

    Añazco E. V ., Seung Ju Han, Kangil Kim, Patricio Rivera Lopez, Tae-Seong Kim, and Sangmin Lee. Hand gesture recognition using single patchable six-axis imu.Sensors, 2021

  44. [44]

    Attention is all you need.Advances in Neural Information Processing Systems, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 2017

  45. [45]

    Hongyi Wen, Julian Ramos Rojas, and Anind K. Dey. Serendipity: Finger gesture recognition using an off-the-shelf smartwatch.Conference on Human Factors in Computing Systems, 2016

  46. [46]

    Transformers in Time Series: A Survey.Transactions on Machine Learning Research, 2023

    Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun. Transformers in Time Series: A Survey.Transactions on Machine Learning Research, 2023

  47. [47]

    Data glove-based gesture recognition using CNN-BiLSTM model with attention mechanism.PLOS ONE, 2023

    Jiawei Wu, Peng Ren, Boming Song, Ran Zhang, Chen Zhao, and Xiao Zhang. Data glove-based gesture recognition using CNN-BiLSTM model with attention mechanism.PLOS ONE, 2023

  48. [48]

    Pathak, and Prasant Mohapatra

    Chao Xu, Parth H. Pathak, and Prasant Mohapatra. Finger-writing with smartwatch: A case for finger and hand gesture recognition using smartwatch.International Workshop on Mobile Computing Systems and Applications, 2015

  49. [49]

    Hudson, Charlie Maalouf, Seyed Mousavi, and Gierad Laput

    Xuhai Xu, Jun Gong, Carolina Brum, Lilian Liang, Bongsoo Suh, Kumar Gupta, Yash Agar- wal, Laurence Lindsey, Runchang Kang, Behrooz Shahsavari, Tu Nguyen, Heriberto Nieto, Scott E. Hudson, Charlie Maalouf, Seyed Mousavi, and Gierad Laput. Enabling hand gesture customization on wrist-worn devices.Conference on Human Factors in Computing Systems (CHI), 2022

  50. [50]

    Opisthenar: Hand poses and finger tapping recognition by observing back of hand using embedded wrist camera.ACM Symposium on User Interface Software and Technology, 2019

    Hui-Shyong Yeo, Erwin Wu, Juyoung Lee, Aaron Quigley, and Hideki Koike. Opisthenar: Hand poses and finger tapping recognition by observing back of hand using embedded wrist camera.ACM Symposium on User Interface Software and Technology, 2019

  51. [51]

    Estimating body and hand motion in an ego-sensed world.arXiv, 2410.03665, 2024

    Brent Yi, Vickie Ye, Maya Zheng, Yunqi Li, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, and Angjoo Kanazawa. Estimating body and hand motion in an ego-sensed world.arXiv, 2410.03665, 2024

  52. [52]

    Creagh, Catherine Tong, Aidan Acquah, David A

    Hang Yuan, Shing Chan, Andrew P. Creagh, Catherine Tong, Aidan Acquah, David A. Clifton, and Aiden Doherty. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data.NPJ Digital Medicine, 2024

  53. [53]

    Ali Heydari, Girish Narayanswamy, Maxwell A

    Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, and al. Sensorlm: Learning the language of wearable sensors.arXiv, 2506.09108, 2025

  54. [54]

    T. Zhao, Y . Wang, Y . Chen, et al. Ppg-based finger-level gesture recognition leveraging wearables.IEEE Conference on Computer Communications, 2018. 19 A Dataset A.1 Dataset Characteristics The Table 3 summarizes the key characteristics of the proposed smartwatch gesture dataset, including participant demographics, recording conditions, and sensor modali...