arxiv: 2604.17065 · v1 · submitted 2026-04-18 · 💻 cs.CV

Recognition: unknown

BasketHAR: A Multimodal Dataset for Human Activity Recognition and Sport Analysis in Basketball Training Scenarios

Xian Gao , Haoyue Zhang , Zongyun Zhang , Jiacheng Ruan , Ting Liu , Yuzhuo Fu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords human activity recognitionmultimodal datasetbasketballsports analyticsinertial sensorsvideo dataperformance analysis

0 comments

The pith

BasketHAR supplies a multimodal dataset of professional basketball actions for human activity recognition and sports analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BasketHAR to fill the gap in human activity recognition datasets, which mostly cover basic movements like walking. It collects data from inertial sensors, gyroscopes, heart rate monitors, skin temperature sensors, and video cameras during professional-level basketball training. A baseline method for aligning these modalities is provided to allow researchers to benchmark performance. This matters because it opens the door to specialized applications in analyzing athlete performance and generating training reports using advanced recognition techniques.

Core claim

BasketHAR is a novel multimodal HAR dataset tailored for basketball training that encompasses a diverse set of professional-level actions, including comprehensive motion data from inertial measurement units, angular velocity, magnetic field, heart rate, skin temperature, and synchronized video recordings, along with a baseline multimodal alignment method that underscores the dataset's complexity and suitability for advanced HAR tasks and sports analytics.

What carries the argument

The BasketHAR dataset itself, which combines multiple sensor streams from IMUs, physiological monitors, and video for basketball-specific activities, supported by a baseline alignment method for multimodal data.

If this is right

Researchers can develop and test HAR models specifically for sports performance analysis using the provided data.
Training sessions can be analyzed to produce specialized performance reports based on recognized actions.
The baseline method offers a reproducible starting point for comparing multimodal fusion approaches in HAR.
The dataset's focus on professional actions demonstrates greater complexity than standard basic activity datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models trained on this data could enable real-time coaching tools that detect technique errors during practice.
Similar datasets might be created for other sports to expand specialized HAR applications.
The public availability allows for community-driven improvements in alignment techniques and activity classification.

Load-bearing premise

The recordings from the sensors and video accurately capture and represent professional-level basketball activities in a way that is generalizable to real training scenarios.

What would settle it

Demonstrating that standard HAR models achieve similar performance on BasketHAR as on basic activity datasets without needing specialized multimodal methods would challenge the claim of its suitability for advanced tasks.

Figures

Figures reproduced from arXiv: 2604.17065 by Haoyue Zhang, Jiacheng Ruan, Ting Liu, Xian Gao, Yuzhuo Fu, Zongyun Zhang.

**Figure 2.** Figure 2: Dataset Overview. The BasketHAR dataset encompasses three modalities of data collected synchronously during [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The proposed multimodal alignment approach as a [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The t-SNE visualization of features extracted by our [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: A human-in-the-loop prompt optimization frame [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

Human Activity Recognition (HAR) involves the automatic identification of user activities and has gained significant research interest due to its broad applicability. Most HAR systems rely on supervised learning, which necessitates large, diverse, and well-annotated datasets. However, existing datasets predominantly focus on basic activities such as walking, standing, and stair navigation, limiting their utility in specialized contexts like sports performance analysis. To address this gap, we present BasketHAR, a novel multimodal HAR dataset tailored for basketball training, encompassing a diverse set of professional-level actions. BasketHAR includes comprehensive motion data from inertial measurement units (accelerometers and gyroscopes), angular velocity, magnetic field, heart rate, skin temperature, and synchronized video recordings. We also provide a baseline multimodal alignment method to benchmark performance. Experimental results underscore the dataset's complexity and suitability for advanced HAR tasks. Furthermore, we highlight its potential applications in the analysis of basketball training sessions and in the generation of specialized performance reports, representing a valuable resource for future research in HAR and sports analytics. The dataset are publicly accessible at https://huggingface.co/datasets/Xian-Gao/BasketHAR licensed under Apache License 2.0.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BasketHAR is a straightforward release of a multimodal basketball HAR dataset that fills a narrow gap but stands or falls on data quality details not fully visible in the abstract.

read the letter

The core of this paper is the public release of BasketHAR, a dataset with synchronized IMU, heart rate, skin temperature, and video streams collected during basketball training. It focuses on professional-level actions instead of the usual walking or sitting that dominate general HAR collections. That domain focus is the main new element, and the Hugging Face upload under Apache 2.0 is a practical step that lets others actually use it without barriers. Adding a baseline multimodal alignment method gives researchers a concrete starting point rather than leaving them to guess how to handle the streams. Those choices make the work usable for people who need sport-specific data for training analysis or wearable applications. The soft spots are the usual ones for dataset papers: the abstract does not spell out collection protocols, annotation procedures, or how they confirmed the actions were truly professional level. Without those, it is hard to judge whether the recordings are clean enough or the labels consistent enough to support the claim that the dataset is complex and suitable for advanced tasks. The baseline results are mentioned but not broken down, so we cannot yet see how much of a benchmark they actually provide. This is the kind of paper that matters to researchers working on domain-specific HAR or sports analytics who need fresh multimodal data. If the full manuscript shows solid collection methods and reproducible baselines, it is worth their time. Otherwise it stays a niche resource. I would bring it to a reading group only if the group is already looking at sports or wearable datasets. I would not cite it in my own work unless I start a project in that exact area. It deserves peer review because new, openly licensed datasets can be genuinely helpful even when the technical novelty is modest, and referees can verify the missing collection details.

Referee Report

1 major / 2 minor

Summary. The paper introduces BasketHAR, a novel multimodal dataset for human activity recognition (HAR) in basketball training scenarios. It collects synchronized streams from inertial measurement units (accelerometers, gyroscopes, angular velocity, magnetic field), physiological sensors (heart rate, skin temperature), and video recordings covering a diverse set of professional-level basketball actions. The authors also provide a baseline multimodal alignment method and report experimental results intended to demonstrate the dataset's complexity and suitability for advanced HAR tasks and sports analytics. The dataset is released publicly on Hugging Face under the Apache 2.0 license.

Significance. If the data collection protocols, synchronization accuracy, and annotation quality are as described, BasketHAR would address a clear gap in specialized sports-focused HAR datasets, enabling new work on performance analysis, training optimization, and multimodal modeling in dynamic, high-variance environments. The public release with an open license directly supports reproducibility and community extension.

major comments (1)

§4 (Baseline Experiments): The multimodal alignment method is positioned as a benchmark, yet the manuscript provides no quantitative metrics, ablation studies, or comparisons against unimodal baselines or prior alignment techniques; without these, the experimental results cannot fully substantiate the claim that they 'underscore the dataset's complexity and suitability for advanced HAR tasks.'

minor comments (2)

Abstract: The sentence 'The dataset are publicly accessible' contains a subject-verb agreement error and should read 'The dataset is publicly accessible.'
§2 (Related Work): The discussion of existing HAR datasets would benefit from a table summarizing key attributes (modalities, activity types, scale) to more clearly position BasketHAR's novelty.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript and the positive overall assessment of BasketHAR. We address the major comment below and will revise the paper accordingly to strengthen the experimental section.

read point-by-point responses

Referee: [—] §4 (Baseline Experiments): The multimodal alignment method is positioned as a benchmark, yet the manuscript provides no quantitative metrics, ablation studies, or comparisons against unimodal baselines or prior alignment techniques; without these, the experimental results cannot fully substantiate the claim that they 'underscore the dataset's complexity and suitability for advanced HAR tasks.'

Authors: We agree that the current description of the baseline multimodal alignment method in Section 4 would benefit from additional quantitative support to fully substantiate the claims regarding dataset complexity and utility. In the revised manuscript, we will expand this section to include: (1) quantitative metrics such as alignment error rates, synchronization accuracy (e.g., temporal offset statistics), and downstream HAR performance (precision, recall, F1-score) using the aligned multimodal streams; (2) ablation studies isolating the contribution of each modality (IMU, physiological, video); and (3) comparisons against standard unimodal baselines and prior alignment techniques (e.g., dynamic time warping and cross-modal attention). These additions will be supported by new tables and figures while keeping the focus on the dataset itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity in dataset release paper

full rationale

This is a dataset presentation paper with no mathematical derivations, fitted parameters, predictions, or load-bearing self-citations. The central contribution is the public release of BasketHAR (with IMU, physiological, and video streams) plus a baseline alignment method; the claim of utility is directly supported by the Apache 2.0 Hugging Face release and does not reduce to any internal fit or self-referential definition. No steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Dataset papers rest on standard assumptions about sensor accuracy and annotation quality rather than new axioms or parameters.

pith-pipeline@v0.9.0 · 5522 in / 969 out tokens · 37046 ms · 2026-05-10T06:33:06.332824+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 21 canonical work pages · 2 internal anchors

[1]

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge L Reyes- Ortiz. 2013. A Public Domain Dataset for Human Activity Recognition Using Smartphones.Computational Intelligence(2013)

2013
[2]

Sara Ashry, Tetsuji Ogawa, and Walid Gomaa. 2020. CHARM-Deep: Continuous Human Activity Recognition Model Based on Deep Neural Network Using IMU Sensors of Smartwatch.IEEE Sensors Journal20, 15 (Aug. 2020), 8757–8770. doi:10.1109/JSEN.2020.2985374

work page doi:10.1109/jsen.2020.2985374 2020
[3]

Santoyo-Ramón, and Jose M

Eduardo Casilari, Jose A. Santoyo-Ramón, and Jose M. Cano-García. 2017. UMAFall: A Multisensor Dataset for the Research on Automatic Fall Detection. Procedia Computer Science110 (2017), 32–39. doi:10.1016/j.procs.2017.06.110

work page doi:10.1016/j.procs.2017.06.110 2017
[4]

Millán, and Daniel Roggen

Ricardo Chavarriaga, Hesam Sagha, Alberto Calatroni, Sundara Tejaswi Digu- marti, Gerhard Tröster, José Del R. Millán, and Daniel Roggen. 2013. The Op- portunity Challenge: A Benchmark Database for on-Body Sensor-Based Ac- tivity Recognition.Pattern Recognition Letters34, 15 (Nov. 2013), 2033–2042. doi:10.1016/j.patrec.2012.12.014

work page doi:10.1016/j.patrec.2012.12.014 2013
[5]

Xian Gao, Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Zongyun Zhang, Ting Liu, and Yuzhuo Fu. 2025. From Motion Signals to Insights: A Unified Frame- work for Student Behavior Analysis and Feedback in Physical Education Classes. arXiv:2503.06525

work page arXiv 2025
[6]

Daniel Garcia-Gonzalez, Daniel Rivero, Enrique Fernandez-Blanco, and Miguel R. Luaces. 2020. A Public Domain Dataset for Real-Life Human Activity Recognition Using Smartphone Sensors.Sensors20, 8 (April 2020), 2200. doi:10.3390/s20082200

work page doi:10.3390/s20082200 2020
[7]

Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. 2023. ImageBind One Embedding Space to Bind Them All. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Vancouver, BC, Canada, 15180–15190. doi:10.1109/ CVPR52729.2023.01457

work page arXiv 2023
[8]

Yu Guan and Thomas Plötz. 2017. Ensembles of Deep LSTM Learners for Activity Recognition Using Wearables.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies1, 2 (June 2017), 1–28. doi:10.1145/3090076

work page doi:10.1145/3090076 2017
[9]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. doi:10.48550/arXiv.2106.09685 arXiv:2106.09685 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2106.09685 2021
[10]

Wenbo Huang, Lei Zhang, Wenbin Gao, Fuhong Min, and Jun He. 2021. Shallow Convolutional Neural Networks for Human Activity Recognition Using Wearable Sensors.IEEE Transactions on Instrumentation and Measurement70 (2021), 1–11. doi:10.1109/TIM.2021.3091990

work page doi:10.1109/tim.2021.3091990 2021
[11]

Masaya Inoue, Sozo Inoue, and Takeshi Nishida. 2018. Deep Recurrent Neural Net- work for Mobile Human Activity Recognition with High Throughput.Artificial Life and Robotics23, 2 (June 2018), 173–185. doi:10.1007/s10015-017-0422-x

work page doi:10.1007/s10015-017-0422-x 2018
[12]

Nobuo Kawaguchi, Nobuhiro Ogawa, Yohei Iwasaki, Katsuhiko Kaji, Tsutomu Terada, Kazuya Murao, Sozo Inoue, Yoshihiro Kawahara, Yasuyuki Sumi, and Nobuhiko Nishio. 2011. HASC Challenge: Gathering Large Scale Human Activity Corpus for the Real-World Activity Understandings. InProceedings of the 2nd Augmented Human International Conference. ACM, Tokyo Japan, ...

work page arXiv 2011
[13]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Opti- mization. doi:10.48550/arXiv.1412.6980 arXiv:1412.6980 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980 2017
[14]

Kwapisz, Gary M

Jennifer R. Kwapisz, Gary M. Weiss, and Samuel A. Moore. 2011. Activity Recog- nition Using Cell Phone Accelerometers.ACM SIGKDD Explorations Newsletter 12, 2 (March 2011), 74–82. doi:10.1145/1964897.1964918

work page doi:10.1145/1964897.1964918 2011
[15]

Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Alireza Dirafzoon, Aparajita Saraf, Amy Bearman, and Babak Damavandi. 2022. IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text. doi:10.48550/arXiv.2210.14395 arXiv:2210.14395 [cs]

work page doi:10.48550/arxiv.2210.14395 2022
[16]

Abdulmajid Murad and Jae-Young Pyun. 2017. Deep Recurrent Neural Networks for Human Activity Recognition.Sensors17, 11 (Nov. 2017), 2556. doi:10.3390/ s17112556

2017
[17]

Jorge-Luis Reyes-Ortiz, Luca Oneto, Alessandro Ghio, Albert Samá, Davide An- guita, and Xavier Parra. 2014. Human Activity Recognition on Smartphones with Awareness of Basic Activities and Postural Transitions. InArtificial Neural Networks and Machine Learning – ICANN 2014, Stefan Wermter, Cornelius Weber, Włodzisław Duch, Timo Honkela, Petia Koprinkova-H...

work page doi:10.1007/978-3-319-11179-7_23 2014
[18]

Charissa Ann Ronao and Sung-Bae Cho. 2016. Human Activity Recognition with Smartphone Sensors Using Deep Learning Neural Networks.Expert Systems with Applications59 (Oct. 2016), 235–244. doi:10.1016/j.eswa.2016.04.032

work page doi:10.1016/j.eswa.2016.04.032 2016
[19]

Mah- fuzul Islam, and Md

Swapnil Sayan Saha, Shafizur Rahman, Miftahul Jannat Rasna, A.K.M. Mah- fuzul Islam, and Md. Atiqur Rahman Ahad. 2018. DU-MD: An Open-Source Human Action Dataset for Ubiquitous Wearable Sensors. In2018 Joint 7th In- ternational Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recogn...

2018
[20]

doi:10.1109/ICIEV.2018.8641051

work page doi:10.1109/iciev.2018.8641051 2018
[21]

Niloy Sikder and Abdullah-Al Nahid. 2021. KU-HAR: An Open Dataset for Heterogeneous Human Activity Recognition.Pattern Recognition Letters146 (June 2021), 46–54. doi:10.1016/j.patrec.2021.02.024

work page doi:10.1016/j.patrec.2021.02.024 2021
[22]

Yang, Roozbeh Jafari, S

Allen Y. Yang, Roozbeh Jafari, S. Shankar Sastry, and Ruzena Bajcsy. 2009. Dis- tributed Recognition of Human Actions Using Wearable Motion Sensor Networks. Journal of Ambient Intelligence and Smart Environments1, 2 (2009), 103–115. doi:10.3233/AIS-2009-0016

work page doi:10.3233/ais-2009-0016 2009
[23]

Nguyen, Bo Yu, Ole J

Ming Zeng, Le T. Nguyen, Bo Yu, Ole J. Mengshoel, Jiang Zhu, Pang Wu, and Joy Zhang. 2014. Convolutional Neural Networks for Human Activity Recognition Using Mobile Sensors. In6th International Conference on Mobile Computing, Applications and Services. 197–205. doi:10.4108/icst.mobicase.2014.257786

work page doi:10.4108/icst.mobicase.2014.257786 2014
[24]

Mi Zhang and Alexander A. Sawchuk. 2012. USC-HAD: A Daily Activity Dataset for Ubiquitous Activity Recognition Using Wearable Sensors. InProceedings of the 2012 ACM Conference on Ubiquitous Computing. ACM, Pittsburgh Pennsylvania, 1036–1043. doi:10.1145/2370216.2370438

work page doi:10.1145/2370216.2370438 2012