pith. sign in

arxiv: 2506.03198 · v4 · submitted 2025-06-02 · 💻 cs.CV · cs.AI

FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

Pith reviewed 2026-05-19 11:36 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords action quality assessmentfitnessmultimodal datasetmultiviewsurface electromyographyknowledge graph3D posevideo question answering
0
0 comments X

The pith

Multimodal multiview data with sEMG and knowledge graphs improves fitness action quality assessment

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FLEX, a new large-scale dataset for action quality assessment in fitness activities such as weight training. It includes over 7,500 multiview recordings from 38 subjects performing 20 exercises, with synchronized RGB video, 3D pose, surface electromyography signals, and physiological data. Annotations are structured using a Fitness Knowledge Graph that connects actions to key steps, errors, and feedback for interpretable scoring. Baseline experiments show that incorporating multimodal inputs, multiview perspectives, and these fine-grained annotations leads to better performance in evaluating action quality. This development supports more effective AI-based feedback systems that could help users improve form and avoid injuries during workouts.

Core claim

FLEX is the first large-scale multimodal multiview dataset for fitness AQA incorporating sEMG, with expert annotations organized into a Fitness Knowledge Graph supporting compositional scoring. It enables multimodal fusion, cross-modal prediction like Video to EMG, and the FLEX-VideoQA benchmark for hierarchical queries. Baseline experiments demonstrate that multimodal inputs, multiview video, and fine-grained annotations significantly enhance AQA performance.

What carries the argument

The Fitness Knowledge Graph (FKG) that links actions, key steps, error types, and feedback to enable structured, interpretable quality assessment and compositional scoring.

If this is right

  • Multimodal inputs significantly enhance AQA performance.
  • Multiview video contributes to improved assessment accuracy.
  • Fine-grained annotations from the knowledge graph boost results.
  • New tasks such as predicting EMG signals from video are supported.
  • The VideoQA benchmark promotes cross-modal reasoning in vision-language models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI-powered fitness coaching systems could provide real-time form corrections using similar multimodal setups.
  • The structured annotations may facilitate transfer of quality assessment models to other physical training domains.
  • Integration with wearable sensors could extend this approach to everyday exercise monitoring.
  • Cross-modal learning from this data might uncover new biomechanical relationships between observed form and muscle activity.

Load-bearing premise

The data collected from 38 subjects and their expert Fitness Knowledge Graph annotations represent the diversity of skill levels, error patterns, and conditions necessary for models to generalize to real-world use.

What would settle it

A test showing that AQA models trained with only single-view RGB video perform as well as or better than those using the full multimodal and multiview FLEX data on new fitness recordings.

Figures

Figures reproduced from arXiv: 2506.03198 by Hao Yin, Lijun Gu, Lin Xu, Paritosh Parmar, Tianxiao Guo, Tianyou Zheng, Weiwei Fu, Xiujin Liu, Yang Zhang.

Figure 1
Figure 1. Figure 1: An overview of the FLEX dataset. FLEX dataset consists of a core group of 38 subjects, each performing 20 different fitness actions, repeating each action 10 times. Each action repeat was recorded from 5 viewpoints, & sEMG signals and physiological parameters (heart rate, breath rate) were simultaneously collected along with videos. The data annotations contain rich text information such as action knots (A… view at source ↗
Figure 2
Figure 2. Figure 2: Data collection environment. Four cinema cameras and one smartphone were fixed at the four corners of the collection area. Video, sEMG, heart rate, and breath rate are recorded synchronously during collection. Multimodal information. Existing datasets predominantly include modalities such as images[1], texts[12], skeletal points[5], and audio[42], with limited exploration of other potentially valuable phys… view at source ↗
Figure 3
Figure 3. Figure 3: Annotation Process. The annotators we recruited were trained according to the sources of the annotation guidelines and underwent centralized training to ensure they thoroughly understood the rules. The video data was segmented following predetermined criteria, and a two-stage annotation process was implemented to reduce annotation errors and mitigate subjective bias. of trainers with over three years of ex… view at source ↗
Figure 4
Figure 4. Figure 4: The overview of the FLEX actions. 7.2 Subject Recruitment Humans, as the core component in action performance, directly influence action quality. To collect more comprehensive data, the FLEX dataset required subjects across various capability levels compared with datasets that only contained professional-level subjects. So, we extensively recruited subjects within our institution and local commercial gyms,… view at source ↗
Figure 5
Figure 5. Figure 5: The overview of the FLEX knowledge graph. (a) Visualization of frequently used annotation words. (b) FLEX-KG: the structure of the knowledge graph. (c1) Mapping between actions and action knots. (c2) Mapping between action knots and error types. 7.5 Annotator Recruitment Due to the vast volume of the FLEX dataset and stringent annotation quality requirements, we recruited 16 professional practitioners from… view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of exemplary errors and scoring during one of the exercises—barbell overhead press. Several key errors were observed that could compromise form and effectiveness. First, in the preparation process, the stance was too narrow and the grip was open rather than closed. When pressing overhead, trunk swaying and excessive elbow hyperextension were noted. The barbell was lowered too quickly, while e… view at source ↗
Figure 7
Figure 7. Figure 7: (a) Sample number of 20 actions. (b) Average duration and score of 20 actions. (c) Overall [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Weight of subjects loaded in fitness actions. FLEX comprises 20 weight-loaded fitness actions evenly divided between barbells and dumbbells. For barbells, the intrinsic weight of the bar (20 kg) is included in the calculation, whereas for dumbbells, only the single-sided weight is considered. In the figure, the X-axis denotes different subjects, and the Y-axis indicates the various actions. The color inten… view at source ↗
Figure 9
Figure 9. Figure 9: The construction of FLEX-VideoQA dataset. We designed a dialogue template following the pipeline “action recognition → action standards → action evaluation → action scoring,” with all questions and reference answers automatically generated from our annotation rules and results. In particular, action-evaluation answers were pre-generated by DeepseekV3 by combining video samples, action knots, error types, a… view at source ↗
Figure 10
Figure 10. Figure 10: The result of croissant checker. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
read the original abstract

Action Quality Assessment (AQA) -- the task of quantifying how well an action is performed -- has great potential for detecting errors in gym weight training, where accurate feedback is critical to prevent injuries and maximize gains. Existing AQA datasets, however, are limited to single-view competitive sports and RGB video, lacking multimodal signals and professional assessment of fitness actions. We introduce FLEX, the first large-scale, multimodal, multiview dataset for fitness AQA that incorporates surface electromyography (sEMG). FLEX contains over 7,500 multiview recordings of 20 weight-loaded exercises performed by 38 subjects of diverse skill levels, with synchronized RGB video, 3D pose, sEMG, and physiological signals. Expert annotations are organized into a Fitness Knowledge Graph (FKG) linking actions, key steps, error types, and feedback, supporting a compositional scoring function for interpretable quality assessment. FLEX enables multimodal fusion, cross-modal prediction -- including the novel Video$\rightarrow$EMG task -- and biomechanically oriented representation learning. Building on the FKG, we further introduce FLEX-VideoQA, a structured question-answering benchmark with hierarchical queries that drive cross-modal reasoning in vision-language models. Baseline experiments demonstrate that multimodal inputs, multiview video, and fine-grained annotations significantly enhance AQA performance. FLEX thus advances AQA toward richer multimodal settings and provides a foundation for AI-powered fitness assessment and coaching. Dataset and code are available at \href{https://github.com/HaoYin116/FLEX}{https://github.com/HaoYin116/FLEX}. Link to Project \href{https://haoyin116.github.io/FLEX_Dataset}{page}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces FLEX, a large-scale multimodal multiview dataset for fitness action quality assessment (AQA) consisting of over 7,500 synchronized recordings (RGB video, 3D pose, sEMG, physiological signals) of 20 weight-loaded exercises performed by 38 subjects of varying skill levels. Expert annotations are structured via a Fitness Knowledge Graph (FKG) supporting compositional scoring; the work also releases the FLEX-VideoQA benchmark and reports baseline results claiming that multimodal fusion, multiview inputs, and fine-grained FKG annotations yield significant AQA performance gains over unimodal or single-view alternatives.

Significance. If the reported baseline improvements are shown to hold under subject-disjoint evaluation protocols and the 38-subject cohort adequately samples error patterns and skill variation, FLEX would constitute a valuable addition to the AQA literature by moving beyond single-view RGB sports datasets and enabling cross-modal tasks such as Video-to-EMG prediction. The provision of the FKG and the associated VideoQA benchmark further supports interpretable, biomechanically grounded modeling.

major comments (2)
  1. [Baseline experiments] Baseline experiments section: the manuscript does not state whether train/test splits are subject-disjoint. With only 38 subjects, any subject overlap would allow models to exploit person-specific sEMG signatures, movement idiosyncrasies, or annotation biases rather than learning transferable skill representations, directly undermining the central claim that multimodal and multiview inputs produce generalizable AQA improvements.
  2. [Dataset construction] Dataset description and Table 1 (or equivalent subject statistics): no breakdown is provided of how the 38 subjects are distributed across skill levels, nor are inter-annotator agreement statistics or error-type coverage reported for the FKG. These omissions make it impossible to assess whether the weakest assumption—that the recordings represent the range of real-world fitness errors—holds.
minor comments (2)
  1. [Abstract] Abstract: 'Largescale' should be hyphenated as 'Large-scale'.
  2. [Abstract] The abstract states that baselines 'significantly enhance AQA performance' yet supplies no numerical deltas, error bars, or statistical tests; these quantitative details should appear in the abstract or be clearly cross-referenced to the results tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. These have helped us identify important clarifications needed to strengthen the presentation of our work. We respond to each major comment below and commit to revisions that directly address the concerns raised.

read point-by-point responses
  1. Referee: [Baseline experiments] Baseline experiments section: the manuscript does not state whether train/test splits are subject-disjoint. With only 38 subjects, any subject overlap would allow models to exploit person-specific sEMG signatures, movement idiosyncrasies, or annotation biases rather than learning transferable skill representations, directly undermining the central claim that multimodal and multiview inputs produce generalizable AQA improvements.

    Authors: We agree that subject-disjoint splits are essential for validating generalizable AQA improvements, especially with a modest cohort size. Our baseline experiments were conducted using subject-disjoint train/test splits to prevent leakage of person-specific patterns. This protocol was followed but not explicitly documented in the section. We will revise the baseline experiments section to clearly state that all reported results use subject-disjoint splits, provide the exact split ratios, and describe the subject partitioning procedure. revision: yes

  2. Referee: [Dataset construction] Dataset description and Table 1 (or equivalent subject statistics): no breakdown is provided of how the 38 subjects are distributed across skill levels, nor are inter-annotator agreement statistics or error-type coverage reported for the FKG. These omissions make it impossible to assess whether the weakest assumption—that the recordings represent the range of real-world fitness errors—holds.

    Authors: We thank the referee for pointing out these omissions. We will expand the dataset description and update Table 1 to include a breakdown of the 38 subjects by skill level (beginner, intermediate, advanced) as assessed by experts. We will also add inter-annotator agreement statistics (e.g., Cohen's kappa) for the FKG annotations. For error-type coverage, we will include a summary of the error categories and their frequencies in the dataset to better demonstrate representation of real-world fitness errors. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset and benchmarking paper

full rationale

This paper introduces a new multimodal fitness AQA dataset (FLEX) with recordings, sEMG, 3D pose, and Fitness Knowledge Graph annotations from 38 subjects, then reports baseline experiments on multimodal fusion and VideoQA. No mathematical derivations, equations, or predictions are present that could reduce to fitted parameters or self-defined quantities by construction. The central claims rest on data collection and empirical performance lifts, which are independent of any internal definitions or self-citation chains. No load-bearing steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central contribution is empirical data collection and a new annotation graph rather than new mathematical axioms, free parameters, or derivations from prior literature.

invented entities (1)
  • Fitness Knowledge Graph (FKG) no independent evidence
    purpose: Organize expert annotations linking actions, key steps, error types, and feedback to enable compositional and interpretable quality scoring.
    New structure introduced to support structured representation learning and the VideoQA benchmark.

pith-pipeline@v0.9.0 · 5866 in / 1212 out tokens · 60069 ms · 2026-05-19T11:36:30.904008+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos

    cs.CV 2026-04 unverdicted novelty 7.0

    ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Assessing the quality of actions

    Hamed Pirsiavash, Carl V ondrick, and Antonio Torralba. Assessing the quality of actions. In European Conference on Computer Vision, pages 556–571. Springer, 2014. 3, 4, 5, 19

  2. [2]

    Learning to score olympic events

    Paritosh Parmar and Brendan Tran Morris. Learning to score olympic events. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 20–28,

  3. [3]

    Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion model- ing

    Yixin Gao, S Swaroop Vedula, Carol E Reiley, Narges Ahmidi, Balakrishnan Varadarajan, Henry C Lin, Lingling Tao, Luca Zappella, Benjamın Béjar, and David D Yuh. Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion model- ing. In Medical Image Computing and Computer Assisted Intervention Workshop, volume ...

  4. [4]

    A data set of human body movements for physical rehabilitation exercises

    Aleksandar Vakanski, Hyung-pil Jun, David Paul, and Russell Baker. A data set of human body movements for physical rehabilitation exercises. Data, 3(1):2, 2018. 3

  5. [5]

    The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation

    Marianna Capecci, Maria Gabriella Ceravolo, Francesco Ferracuti, Sabrina Iarlori, Andrea Monteriu, Luca Romeo, and Federica Verdini. The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7):1436–1448, 2019. 3, 5

  6. [6]

    Domain knowledge-informed self-supervised representations for workout form assessment

    Paritosh Parmar, Amol Gharat, and Helge Rhodin. Domain knowledge-informed self-supervised representations for workout form assessment. In European Conference on Computer Vision, pages 105–123. Springer, 2022. 3, 4, 15

  7. [7]

    Egoexo-fitness: Towards egocentric and exocentric full-body action understanding

    Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, and Wei-Shi Zheng. Egoexo-fitness: Towards egocentric and exocentric full-body action understanding. In European Conference on Computer Vision, 2024. 3, 4, 5, 15, 16

  8. [8]

    Temporal distance matrices for squat classification

    Ryoji Ogata, Edgar Simo-Serra, Satoshi Iizuka, and Hiroshi Ishikawa. Temporal distance matrices for squat classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 3, 4, 15

  9. [9]

    Assembly101: A large-scale multi-view video dataset for understanding procedural activities

    Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, and Angela Yao. Assembly101: A large-scale multi-view video dataset for understanding procedural activities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21096–21106, 2022. 3

  10. [10]

    Gaia: Rethinking action quality assessment for ai-generated videos

    Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, and Wenjun Zhang. Gaia: Rethinking action quality assessment for ai-generated videos. In Advances in Neural Information Processing Systems, 2024. 3

  11. [11]

    Action quality assessment across multiple actions

    Paritosh Parmar and Brendan Morris. Action quality assessment across multiple actions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1468–1476. IEEE, 2019. 3

  12. [12]

    What and how well you performed? a multitask learning approach to action quality assessment

    Paritosh Parmar and Brendan Tran Morris. What and how well you performed? a multitask learning approach to action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 304–313, 2019. 3, 5

  13. [13]

    Learning to score figure skating sport videos

    Chengming Xu, Yanwei Fu, Bing Zhang, Zitian Chen, Yu-Gang Jiang, and Xiangyang Xue. Learning to score figure skating sport videos. IEEE Transactions on Circuits and Systems for Video Technology, 30(12):4578–4590, 2019. 3

  14. [14]

    An asymmetric modeling for action assessment

    Jibin Gao, Wei-Shi Zheng, Jia-Hui Pan, Chengying Gao, Yaowei Wang, Wei Zeng, and Jian- huang Lai. An asymmetric modeling for action assessment. In European Conference on Computer Vision, pages 222–238. Springer, 2020. 3

  15. [15]

    Hybrid dynamic-static context-aware attention network for action assessment in long videos

    Ling-An Zeng, Fa-Ting Hong, Wei-Shi Zheng, Qi-Zhi Yu, Wei Zeng, Yao-Wei Wang, and Jian-Huang Lai. Hybrid dynamic-static context-aware attention network for action assessment in long videos. In Proceedings of the ACM International Conference on Multimedia, pages 2526–2534, 2020. 3 10

  16. [16]

    Tsa-net: Tube self-attention network for action quality assessment

    Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, and Lihua Zhang. Tsa-net: Tube self-attention network for action quality assessment. In Proceedings of the ACM International Conference on Multimedia, pages 4902–4910, 2021. 3

  17. [17]

    Finediving: A fine-grained dataset for procedure-aware action quality assessment

    Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, and Jiwen Lu. Finediving: A fine-grained dataset for procedure-aware action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2949–2958, 2022. 3, 4

  18. [18]

    Logo: A long-form video dataset for group action quality assessment

    Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, and Yansong Tang. Logo: A long-form video dataset for group action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2405–2414, 2023. 3

  19. [19]

    Localization- assisted uncertainty score disentanglement network for action quality assessment

    Yanli Ji, Lingfeng Ye, Huili Huang, Lijing Mao, Yang Zhou, and Lingling Gao. Localization- assisted uncertainty score disentanglement network for action quality assessment. In Proceed- ings of the ACM International Conference on Multimedia, pages 8590–8597, 2023. 3

  20. [20]

    Automatic modelling for interactive action assessment

    Jibin Gao, Jia-Hui Pan, Shao-Jie Zhang, and Wei-Shi Zheng. Automatic modelling for interactive action assessment. International Journal of Computer Vision, 131(3):659–679, 2023. 3

  21. [21]

    Lucidaction: A hierarchical and multi- model dataset for comprehensive action quality assessment

    Linfeng Dong, Wei Wang, Yu Qiao, and Xiao Sun. Lucidaction: A hierarchical and multi- model dataset for comprehensive action quality assessment. In Advances in Neural Information Processing Systems, 2024. 3, 4

  22. [22]

    Current developments in surface electromyography

    Veysel ALCAN and Murat Z˙INNURO ˘GLU. Current developments in surface electromyography. Turkish Journal of Medical Sciences, 53(5):1019–1031, 2023. 3

  23. [23]

    Extracting time-frequency feature of single-channel vastus medialis emg signals for knee exercise pattern recognition

    Yi Zhang, Peiyang Li, Xuyang Zhu, Steven W Su, Qing Guo, Peng Xu, and Dezhong Yao. Extracting time-frequency feature of single-channel vastus medialis emg signals for knee exercise pattern recognition. PloS one, 12(7):e0180526, 2017. 3

  24. [24]

    Individuals have unique muscle activation signatures as revealed during gait and pedaling

    François Hug, Clément V ogel, Kylie Tucker, Sylvain Dorel, Thibault Deschamps, Éric Le Car- pentier, and Lilian Lacourpaille. Individuals have unique muscle activation signatures as revealed during gait and pedaling. Journal of Applied Physiology, 127(4):1165–1174, 2019. 3

  25. [25]

    Muscle activation patterns are more constrained and regular in treadmill than in overground human locomotion

    Ilaria Mileti, Aurora Serra, Nerses Wolf, Victor Munoz-Martel, Antonis Ekizos, Eduardo Palermo, Adamantios Arampatzis, and Alessandro Santuz. Muscle activation patterns are more constrained and regular in treadmill than in overground human locomotion. Frontiers in Bioengineering and Biotechnology, 8:581619, 2020. 3

  26. [26]

    A large calibrated database of hand movements and grasps kinematics

    Néstor J Jarque-Bou, Manfredo Atzori, and Henning Müller. A large calibrated database of hand movements and grasps kinematics. Scientific data, 7(1):12, 2020. 3

  27. [27]

    Sex-specific tuning of modular muscle activation patterns for locomotion in young and older adults

    Alessandro Santuz, Lars Janshen, Leon Brüll, Victor Munoz-Martel, Juri Taborri, Stefano Rossi, and Adamantios Arampatzis. Sex-specific tuning of modular muscle activation patterns for locomotion in young and older adults. PLoS One, 17(6):e0269417, 2022. 3

  28. [28]

    semg dataset of routine activities

    Asad Mansoor Khan, Sajid Gul Khawaja, Muhammad Usman Akram, and Ali Saeed Khan. semg dataset of routine activities. Data in brief, 33:106543, 2020. 3

  29. [29]

    Hristo Dimitrov, Anthony M. J. Bull, and Dario Farina. High-density EMG, IMU, kinetic, and kinematic open-source data for comprehensive locomotion activities. Scientific Data, 10(1):1–10, 2023. 3

  30. [30]

    A comparison of neural control of the biarticular gastrocnemius muscles between knee flexion and ankle plantar flexion

    Raphaël Hamard, Jeroen Aeles, Simon Avrillon, Taylor JM Dick, and François Hug. A comparison of neural control of the biarticular gastrocnemius muscles between knee flexion and ankle plantar flexion. Journal of Applied Physiology, 135(2):394–404, 2023. 3

  31. [31]

    A wearable real-time kinetic measurement sensor setup for human locomotion

    Huawei Wang, Akash Basu, Guillaume Durandau, and Massimo Sartori. A wearable real-time kinetic measurement sensor setup for human locomotion. Wearable technologies, 4:e11, 2023. 3 11

  32. [32]

    Electromyo- graphy data for non-invasive naturally-controlled robotic hand prostheses

    Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto, and Henning Müller. Electromyo- graphy data for non-invasive naturally-controlled robotic hand prostheses. Scientific data, 1(1):1–13, 2014. 3

  33. [33]

    Neuropose: 3d hand pose tracking using emg wearables

    Yilin Liu, Shijia Zhang, and Mahanth Gowda. Neuropose: 3d hand pose tracking using emg wearables. In Proceedings of the Web Conference, pages 1471–1482, 2021. 3

  34. [34]

    Sensing the full dynamics of the human hand with a neural interface and deep learning

    Raul C Sîmpetru, Andreas Arkudas, Dominik I Braun, Marius Osswald, Daniela Souza de Oliveira, Bjoern Eskofier, Thomas M Kinfe, and Alessandro Del Vecchio. Sensing the full dynamics of the human hand with a neural interface and deep learning. BioRxiv, pages 2022–07, 2022. 3

  35. [35]

    Dataset for multi- channel surface electromyography (semg) signals of hand gestures

    Mehmet Akif Ozdemir, Deniz Hande Kisa, Onan Guren, and Aydin Akan. Dataset for multi- channel surface electromyography (semg) signals of hand gestures. Data in brief, 41:107921,

  36. [36]

    emg2pose: A large and diverse benchmark for surface electromyographic hand pose estimation

    Sasha Salter, Richard Warren, Collin Schlager, Adrian Spurr, Shangchen Han, Rohin Bhasin, Yujun Cai, Peter Walkington, Anuoluwapo Bolarinwa, Robert J Wang, et al. emg2pose: A large and diverse benchmark for surface electromyographic hand pose estimation. Advances in Neural Information Processing Systems, 37:55703–55728, 2024. 3, 9

  37. [37]

    Fastmove wireless emg, 2024

    FASTMOVE. Fastmove wireless emg, 2024. 3, 5

  38. [38]

    Fastmove 3d motion for realtime, 2024

    FASTMOVE. Fastmove 3d motion for realtime, 2024. 4

  39. [39]

    Zcam e2-m4, 2020

    ZCAM. Zcam e2-m4, 2020. 4

  40. [40]

    M.zuiko digital ed 14-150mm f4.0-5.6, 2022

    OLYMPUS. M.zuiko digital ed 14-150mm f4.0-5.6, 2022. 5

  41. [41]

    Oneplus 7, 2019

    OnePlus. Oneplus 7, 2019. 5

  42. [42]

    Piano skills assessment

    Paritosh Parmar, Jaiden Reddy, and Brendan Morris. Piano skills assessment. In IEEE Interna- tional Workshop on Multimedia Signal Processing, pages 1–5. IEEE, 2021. 5

  43. [43]

    Flag3d: A 3d fitness activity dataset with language instruction

    Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie Zhou, and Xiu Li. Flag3d: A 3d fitness activity dataset with language instruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22106–22117, 2023. 5

  44. [44]

    National occupational skill standard — social sports instructor (occupational code: 4-13-04-01)

    Ministry of Human Resources and Social Security of the People’s Republic of China and General Administration of Sport of China. National occupational skill standard — social sports instructor (occupational code: 4-13-04-01). Standard, Ministry of Human Resources and Social Security of the People’s Republic of China and General Administration of Sport of C...

  45. [45]

    Occupational Competency Training Textbook for Social Sports Instructors—Fitness Coaches (with Technical Action Videos)

    Human Resources Development Center of the General Administration of Sport of China. Occupational Competency Training Textbook for Social Sports Instructors—Fitness Coaches (with Technical Action Videos). Higher Education Press, 2023. 6

  46. [46]

    Fitness and Bodybuilding Tutorial

    Beijing Sport University. Fitness and Bodybuilding Tutorial. Beijing Sport University Press,

  47. [47]

    Joe Weider’s Bodybuilding System

    Joe Weider. Joe Weider’s Bodybuilding System. Weider Pubns, 1998. 6

  48. [48]

    Group-aware contrastive regression for action quality assessment

    Xumin Yu, Yongming Rao, Wenliang Zhao, Jiwen Lu, and Jie Zhou. Group-aware contrastive regression for action quality assessment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7919–7928, 2021. 7, 8, 19, 20

  49. [49]

    Spatial temporal graph convolutional networks for skeleton-based action recognition

    Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. 8 12

  50. [50]

    Ntu rgb+ d: A large scale dataset for 3d human activity analysis

    Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1010–1019, 2016. 8

  51. [51]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report. arXiv preprint arXiv:2502.13923,

  52. [52]

    A decade of action quality assessment: Largest systematic survey of trends, challenges, and future directions

    Hao Yin, Paritosh Parmar, Daoliang Xu, Yang Zhang, Tianyou Zheng, and Weiwei Fu. A decade of action quality assessment: Largest systematic survey of trends, challenges, and future directions. arXiv, 2025. 15

  53. [53]

    Quo vadis, action recognition? a new model and the kinetics dataset

    Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017. 19

  54. [54]

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

    Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. Llamafactory: Unified efficient fine-tuning of 100+ language models. arXiv preprint arXiv:2403.13372, 2024. 20 13 Checklist

  55. [55]

    (a) Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes] Please refer to section 1

    For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes] Please refer to section 1. (b) Did you describe the limitations of your work? [Yes] Please refer to section 5. (c) Did you discuss any potential negative societal impacts of your work? [Yes] Please refer to the Appe...

  56. [56]

    (a) Did you state the full set of assumptions of all theoretical results? [NA] (b) Did you include complete proofs of all theoretical results? [NA]

    If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [NA] (b) Did you include complete proofs of all theoretical results? [NA]

  57. [57]

    for benchmarks)

    If you ran experiments (e.g. for benchmarks)... (a) Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [Yes] Please refer to the Appendix. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please...

  58. [58]

    (a) If your work uses existing assets, did you cite the creators? [Yes] Please refer to section 4

    If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] Please refer to section 4. (b) Did you mention the license of the assets? [Yes] Please refer to the Appendix. (c) Did you include any new assets either in the supplemental material or as a ...

  59. [59]

    (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [Yes] Please refer to the Appendix

    If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [Yes] Please refer to the Appendix. (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [Yes] Please refer ...

  60. [60]

    Push-ups 1

    Kneeling Push-ups 2. Push-ups 1. Pectoralis major 2. Anterior deltoid3. Kneeling Torso Twist 4. Knee Raise + Abs Contract3. Triceps brachii 4. External obliques5. Shoulder Bridge 6. Sit-ups 5. Internal obliques 6. Rectus abdominis7. Leg Reverse Lunge 8. Leg Lunge with Knee Lift7. Iliopsoas 8. Gluteus maximus9. Sumo Squat 10. Jumping Jacks 9. Hamstrings 10...

  61. [61]

    Barbell Bicep Curl 1

    Standing Barbell Overhead Press2. Barbell Bicep Curl 1. Pectoralis major 2. Anterior deltoid3. Barbell Upright Row 4. Dumbbell Front Raise 3. Middle deltoid 4. Posterior deltoid5. Dumbbell Bicep Curl 6. Dumbbell Lateral Raise 5. Triceps brachii 6. Biceps brachii7. Bent-Over Dumbbell Reverse Fly8. Flat Barbell Bench Press7. Brachialis 8. Supraspinatus9. In...

  62. [62]

    action recognition → action standards → action evaluation → action scoring,

    We also report the top 20 most frequent error types. Additionally, we provide the weight loads each subject uses for each action Figure 8. A01A02A03A04A05A06A07A08A09A10A11A12A13A14A15A16A17A18A19A20 350 355 360 365 370 375 380Sample Number Action 150 200 250 300 350 400 A01A02A03A04A05A06A07A08A09A10A11A12A13A14A15A16A17A18A19A20 40 50 60 70 80 DurationS...