pith. machine review for the scientific record. sign in

arxiv: 2604.17065 · v1 · submitted 2026-04-18 · 💻 cs.CV

Recognition: unknown

BasketHAR: A Multimodal Dataset for Human Activity Recognition and Sport Analysis in Basketball Training Scenarios

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:33 UTC · model grok-4.3

classification 💻 cs.CV
keywords human activity recognitionmultimodal datasetbasketballsports analyticsinertial sensorsvideo dataperformance analysis
0
0 comments X

The pith

BasketHAR supplies a multimodal dataset of professional basketball actions for human activity recognition and sports analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BasketHAR to fill the gap in human activity recognition datasets, which mostly cover basic movements like walking. It collects data from inertial sensors, gyroscopes, heart rate monitors, skin temperature sensors, and video cameras during professional-level basketball training. A baseline method for aligning these modalities is provided to allow researchers to benchmark performance. This matters because it opens the door to specialized applications in analyzing athlete performance and generating training reports using advanced recognition techniques.

Core claim

BasketHAR is a novel multimodal HAR dataset tailored for basketball training that encompasses a diverse set of professional-level actions, including comprehensive motion data from inertial measurement units, angular velocity, magnetic field, heart rate, skin temperature, and synchronized video recordings, along with a baseline multimodal alignment method that underscores the dataset's complexity and suitability for advanced HAR tasks and sports analytics.

What carries the argument

The BasketHAR dataset itself, which combines multiple sensor streams from IMUs, physiological monitors, and video for basketball-specific activities, supported by a baseline alignment method for multimodal data.

If this is right

  • Researchers can develop and test HAR models specifically for sports performance analysis using the provided data.
  • Training sessions can be analyzed to produce specialized performance reports based on recognized actions.
  • The baseline method offers a reproducible starting point for comparing multimodal fusion approaches in HAR.
  • The dataset's focus on professional actions demonstrates greater complexity than standard basic activity datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Models trained on this data could enable real-time coaching tools that detect technique errors during practice.
  • Similar datasets might be created for other sports to expand specialized HAR applications.
  • The public availability allows for community-driven improvements in alignment techniques and activity classification.

Load-bearing premise

The recordings from the sensors and video accurately capture and represent professional-level basketball activities in a way that is generalizable to real training scenarios.

What would settle it

Demonstrating that standard HAR models achieve similar performance on BasketHAR as on basic activity datasets without needing specialized multimodal methods would challenge the claim of its suitability for advanced tasks.

Figures

Figures reproduced from arXiv: 2604.17065 by Haoyue Zhang, Jiacheng Ruan, Ting Liu, Xian Gao, Yuzhuo Fu, Zongyun Zhang.

Figure 1
Figure 1. Figure 1: (a) Axes of Motion Relative to User. (b) Sensor Place [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dataset Overview. The BasketHAR dataset encompasses three modalities of data collected synchronously during [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The proposed multimodal alignment approach as a [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The t-SNE visualization of features extracted by our [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: A human-in-the-loop prompt optimization frame [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
read the original abstract

Human Activity Recognition (HAR) involves the automatic identification of user activities and has gained significant research interest due to its broad applicability. Most HAR systems rely on supervised learning, which necessitates large, diverse, and well-annotated datasets. However, existing datasets predominantly focus on basic activities such as walking, standing, and stair navigation, limiting their utility in specialized contexts like sports performance analysis. To address this gap, we present BasketHAR, a novel multimodal HAR dataset tailored for basketball training, encompassing a diverse set of professional-level actions. BasketHAR includes comprehensive motion data from inertial measurement units (accelerometers and gyroscopes), angular velocity, magnetic field, heart rate, skin temperature, and synchronized video recordings. We also provide a baseline multimodal alignment method to benchmark performance. Experimental results underscore the dataset's complexity and suitability for advanced HAR tasks. Furthermore, we highlight its potential applications in the analysis of basketball training sessions and in the generation of specialized performance reports, representing a valuable resource for future research in HAR and sports analytics. The dataset are publicly accessible at https://huggingface.co/datasets/Xian-Gao/BasketHAR licensed under Apache License 2.0.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces BasketHAR, a novel multimodal dataset for human activity recognition (HAR) in basketball training scenarios. It collects synchronized streams from inertial measurement units (accelerometers, gyroscopes, angular velocity, magnetic field), physiological sensors (heart rate, skin temperature), and video recordings covering a diverse set of professional-level basketball actions. The authors also provide a baseline multimodal alignment method and report experimental results intended to demonstrate the dataset's complexity and suitability for advanced HAR tasks and sports analytics. The dataset is released publicly on Hugging Face under the Apache 2.0 license.

Significance. If the data collection protocols, synchronization accuracy, and annotation quality are as described, BasketHAR would address a clear gap in specialized sports-focused HAR datasets, enabling new work on performance analysis, training optimization, and multimodal modeling in dynamic, high-variance environments. The public release with an open license directly supports reproducibility and community extension.

major comments (1)
  1. §4 (Baseline Experiments): The multimodal alignment method is positioned as a benchmark, yet the manuscript provides no quantitative metrics, ablation studies, or comparisons against unimodal baselines or prior alignment techniques; without these, the experimental results cannot fully substantiate the claim that they 'underscore the dataset's complexity and suitability for advanced HAR tasks.'
minor comments (2)
  1. Abstract: The sentence 'The dataset are publicly accessible' contains a subject-verb agreement error and should read 'The dataset is publicly accessible.'
  2. §2 (Related Work): The discussion of existing HAR datasets would benefit from a table summarizing key attributes (modalities, activity types, scale) to more clearly position BasketHAR's novelty.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript and the positive overall assessment of BasketHAR. We address the major comment below and will revise the paper accordingly to strengthen the experimental section.

read point-by-point responses
  1. Referee: [—] §4 (Baseline Experiments): The multimodal alignment method is positioned as a benchmark, yet the manuscript provides no quantitative metrics, ablation studies, or comparisons against unimodal baselines or prior alignment techniques; without these, the experimental results cannot fully substantiate the claim that they 'underscore the dataset's complexity and suitability for advanced HAR tasks.'

    Authors: We agree that the current description of the baseline multimodal alignment method in Section 4 would benefit from additional quantitative support to fully substantiate the claims regarding dataset complexity and utility. In the revised manuscript, we will expand this section to include: (1) quantitative metrics such as alignment error rates, synchronization accuracy (e.g., temporal offset statistics), and downstream HAR performance (precision, recall, F1-score) using the aligned multimodal streams; (2) ablation studies isolating the contribution of each modality (IMU, physiological, video); and (3) comparisons against standard unimodal baselines and prior alignment techniques (e.g., dynamic time warping and cross-modal attention). These additions will be supported by new tables and figures while keeping the focus on the dataset itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity in dataset release paper

full rationale

This is a dataset presentation paper with no mathematical derivations, fitted parameters, predictions, or load-bearing self-citations. The central contribution is the public release of BasketHAR (with IMU, physiological, and video streams) plus a baseline alignment method; the claim of utility is directly supported by the Apache 2.0 Hugging Face release and does not reduce to any internal fit or self-referential definition. No steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Dataset papers rest on standard assumptions about sensor accuracy and annotation quality rather than new axioms or parameters.

pith-pipeline@v0.9.0 · 5522 in / 969 out tokens · 37046 ms · 2026-05-10T06:33:06.332824+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 21 canonical work pages · 2 internal anchors

  1. [1]

    Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge L Reyes- Ortiz. 2013. A Public Domain Dataset for Human Activity Recognition Using Smartphones.Computational Intelligence(2013)

  2. [2]

    Sara Ashry, Tetsuji Ogawa, and Walid Gomaa. 2020. CHARM-Deep: Continuous Human Activity Recognition Model Based on Deep Neural Network Using IMU Sensors of Smartwatch.IEEE Sensors Journal20, 15 (Aug. 2020), 8757–8770. doi:10.1109/JSEN.2020.2985374

  3. [3]

    Santoyo-Ramón, and Jose M

    Eduardo Casilari, Jose A. Santoyo-Ramón, and Jose M. Cano-García. 2017. UMAFall: A Multisensor Dataset for the Research on Automatic Fall Detection. Procedia Computer Science110 (2017), 32–39. doi:10.1016/j.procs.2017.06.110

  4. [4]

    Millán, and Daniel Roggen

    Ricardo Chavarriaga, Hesam Sagha, Alberto Calatroni, Sundara Tejaswi Digu- marti, Gerhard Tröster, José Del R. Millán, and Daniel Roggen. 2013. The Op- portunity Challenge: A Benchmark Database for on-Body Sensor-Based Ac- tivity Recognition.Pattern Recognition Letters34, 15 (Nov. 2013), 2033–2042. doi:10.1016/j.patrec.2012.12.014

  5. [5]

    Xian Gao, Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Zongyun Zhang, Ting Liu, and Yuzhuo Fu. 2025. From Motion Signals to Insights: A Unified Frame- work for Student Behavior Analysis and Feedback in Physical Education Classes. arXiv:2503.06525

  6. [6]

    Daniel Garcia-Gonzalez, Daniel Rivero, Enrique Fernandez-Blanco, and Miguel R. Luaces. 2020. A Public Domain Dataset for Real-Life Human Activity Recognition Using Smartphone Sensors.Sensors20, 8 (April 2020), 2200. doi:10.3390/s20082200

  7. [7]

    Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. 2023. ImageBind One Embedding Space to Bind Them All. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Vancouver, BC, Canada, 15180–15190. doi:10.1109/ CVPR52729.2023.01457

  8. [8]

    Yu Guan and Thomas Plötz. 2017. Ensembles of Deep LSTM Learners for Activity Recognition Using Wearables.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies1, 2 (June 2017), 1–28. doi:10.1145/3090076

  9. [9]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. doi:10.48550/arXiv.2106.09685 arXiv:2106.09685 [cs]

  10. [10]

    Wenbo Huang, Lei Zhang, Wenbin Gao, Fuhong Min, and Jun He. 2021. Shallow Convolutional Neural Networks for Human Activity Recognition Using Wearable Sensors.IEEE Transactions on Instrumentation and Measurement70 (2021), 1–11. doi:10.1109/TIM.2021.3091990

  11. [11]

    Masaya Inoue, Sozo Inoue, and Takeshi Nishida. 2018. Deep Recurrent Neural Net- work for Mobile Human Activity Recognition with High Throughput.Artificial Life and Robotics23, 2 (June 2018), 173–185. doi:10.1007/s10015-017-0422-x

  12. [12]

    Nobuo Kawaguchi, Nobuhiro Ogawa, Yohei Iwasaki, Katsuhiko Kaji, Tsutomu Terada, Kazuya Murao, Sozo Inoue, Yoshihiro Kawahara, Yasuyuki Sumi, and Nobuhiko Nishio. 2011. HASC Challenge: Gathering Large Scale Human Activity Corpus for the Real-World Activity Understandings. InProceedings of the 2nd Augmented Human International Conference. ACM, Tokyo Japan, ...

  13. [13]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Opti- mization. doi:10.48550/arXiv.1412.6980 arXiv:1412.6980 [cs]

  14. [14]

    Kwapisz, Gary M

    Jennifer R. Kwapisz, Gary M. Weiss, and Samuel A. Moore. 2011. Activity Recog- nition Using Cell Phone Accelerometers.ACM SIGKDD Explorations Newsletter 12, 2 (March 2011), 74–82. doi:10.1145/1964897.1964918

  15. [15]

    Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Alireza Dirafzoon, Aparajita Saraf, Amy Bearman, and Babak Damavandi. 2022. IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text. doi:10.48550/arXiv.2210.14395 arXiv:2210.14395 [cs]

  16. [16]

    Abdulmajid Murad and Jae-Young Pyun. 2017. Deep Recurrent Neural Networks for Human Activity Recognition.Sensors17, 11 (Nov. 2017), 2556. doi:10.3390/ s17112556

  17. [17]

    Jorge-Luis Reyes-Ortiz, Luca Oneto, Alessandro Ghio, Albert Samá, Davide An- guita, and Xavier Parra. 2014. Human Activity Recognition on Smartphones with Awareness of Basic Activities and Postural Transitions. InArtificial Neural Networks and Machine Learning – ICANN 2014, Stefan Wermter, Cornelius Weber, Włodzisław Duch, Timo Honkela, Petia Koprinkova-H...

  18. [18]

    Charissa Ann Ronao and Sung-Bae Cho. 2016. Human Activity Recognition with Smartphone Sensors Using Deep Learning Neural Networks.Expert Systems with Applications59 (Oct. 2016), 235–244. doi:10.1016/j.eswa.2016.04.032

  19. [19]

    Mah- fuzul Islam, and Md

    Swapnil Sayan Saha, Shafizur Rahman, Miftahul Jannat Rasna, A.K.M. Mah- fuzul Islam, and Md. Atiqur Rahman Ahad. 2018. DU-MD: An Open-Source Human Action Dataset for Ubiquitous Wearable Sensors. In2018 Joint 7th In- ternational Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recogn...

  20. [20]

    doi:10.1109/ICIEV.2018.8641051

  21. [21]

    Niloy Sikder and Abdullah-Al Nahid. 2021. KU-HAR: An Open Dataset for Heterogeneous Human Activity Recognition.Pattern Recognition Letters146 (June 2021), 46–54. doi:10.1016/j.patrec.2021.02.024

  22. [22]

    Yang, Roozbeh Jafari, S

    Allen Y. Yang, Roozbeh Jafari, S. Shankar Sastry, and Ruzena Bajcsy. 2009. Dis- tributed Recognition of Human Actions Using Wearable Motion Sensor Networks. Journal of Ambient Intelligence and Smart Environments1, 2 (2009), 103–115. doi:10.3233/AIS-2009-0016

  23. [23]

    Nguyen, Bo Yu, Ole J

    Ming Zeng, Le T. Nguyen, Bo Yu, Ole J. Mengshoel, Jiang Zhu, Pang Wu, and Joy Zhang. 2014. Convolutional Neural Networks for Human Activity Recognition Using Mobile Sensors. In6th International Conference on Mobile Computing, Applications and Services. 197–205. doi:10.4108/icst.mobicase.2014.257786

  24. [24]

    Mi Zhang and Alexander A. Sawchuk. 2012. USC-HAD: A Daily Activity Dataset for Ubiquitous Activity Recognition Using Wearable Sensors. InProceedings of the 2012 ACM Conference on Ubiquitous Computing. ACM, Pittsburgh Pennsylvania, 1036–1043. doi:10.1145/2370216.2370438