FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

Hao Yin; Lijun Gu; Lin Xu; Paritosh Parmar; Tianxiao Guo; Tianyou Zheng; Weiwei Fu; Xiujin Liu; Yang Zhang

arxiv: 2506.03198 · v4 · submitted 2025-06-02 · 💻 cs.CV · cs.AI

FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

Hao Yin , Lijun Gu , Paritosh Parmar , Lin Xu , Tianxiao Guo , Xiujin Liu , Weiwei Fu , Yang Zhang

show 1 more author

Tianyou Zheng

This is my paper

Pith reviewed 2026-05-19 11:36 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords action quality assessmentfitnessmultimodal datasetmultiviewsurface electromyographyknowledge graph3D posevideo question answering

0 comments

The pith

Multimodal multiview data with sEMG and knowledge graphs improves fitness action quality assessment

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FLEX, a new large-scale dataset for action quality assessment in fitness activities such as weight training. It includes over 7,500 multiview recordings from 38 subjects performing 20 exercises, with synchronized RGB video, 3D pose, surface electromyography signals, and physiological data. Annotations are structured using a Fitness Knowledge Graph that connects actions to key steps, errors, and feedback for interpretable scoring. Baseline experiments show that incorporating multimodal inputs, multiview perspectives, and these fine-grained annotations leads to better performance in evaluating action quality. This development supports more effective AI-based feedback systems that could help users improve form and avoid injuries during workouts.

Core claim

FLEX is the first large-scale multimodal multiview dataset for fitness AQA incorporating sEMG, with expert annotations organized into a Fitness Knowledge Graph supporting compositional scoring. It enables multimodal fusion, cross-modal prediction like Video to EMG, and the FLEX-VideoQA benchmark for hierarchical queries. Baseline experiments demonstrate that multimodal inputs, multiview video, and fine-grained annotations significantly enhance AQA performance.

What carries the argument

The Fitness Knowledge Graph (FKG) that links actions, key steps, error types, and feedback to enable structured, interpretable quality assessment and compositional scoring.

If this is right

Multimodal inputs significantly enhance AQA performance.
Multiview video contributes to improved assessment accuracy.
Fine-grained annotations from the knowledge graph boost results.
New tasks such as predicting EMG signals from video are supported.
The VideoQA benchmark promotes cross-modal reasoning in vision-language models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

AI-powered fitness coaching systems could provide real-time form corrections using similar multimodal setups.
The structured annotations may facilitate transfer of quality assessment models to other physical training domains.
Integration with wearable sensors could extend this approach to everyday exercise monitoring.
Cross-modal learning from this data might uncover new biomechanical relationships between observed form and muscle activity.

Load-bearing premise

The data collected from 38 subjects and their expert Fitness Knowledge Graph annotations represent the diversity of skill levels, error patterns, and conditions necessary for models to generalize to real-world use.

What would settle it

A test showing that AQA models trained with only single-view RGB video perform as well as or better than those using the full multimodal and multiview FLEX data on new fitness recordings.

Figures

Figures reproduced from arXiv: 2506.03198 by Hao Yin, Lijun Gu, Lin Xu, Paritosh Parmar, Tianxiao Guo, Tianyou Zheng, Weiwei Fu, Xiujin Liu, Yang Zhang.

**Figure 1.** Figure 1: An overview of the FLEX dataset. FLEX dataset consists of a core group of 38 subjects, each performing 20 different fitness actions, repeating each action 10 times. Each action repeat was recorded from 5 viewpoints, & sEMG signals and physiological parameters (heart rate, breath rate) were simultaneously collected along with videos. The data annotations contain rich text information such as action knots (A… view at source ↗

**Figure 2.** Figure 2: Data collection environment. Four cinema cameras and one smartphone were fixed at the four corners of the collection area. Video, sEMG, heart rate, and breath rate are recorded synchronously during collection. Multimodal information. Existing datasets predominantly include modalities such as images[1], texts[12], skeletal points[5], and audio[42], with limited exploration of other potentially valuable phys… view at source ↗

**Figure 3.** Figure 3: Annotation Process. The annotators we recruited were trained according to the sources of the annotation guidelines and underwent centralized training to ensure they thoroughly understood the rules. The video data was segmented following predetermined criteria, and a two-stage annotation process was implemented to reduce annotation errors and mitigate subjective bias. of trainers with over three years of ex… view at source ↗

**Figure 4.** Figure 4: The overview of the FLEX actions. 7.2 Subject Recruitment Humans, as the core component in action performance, directly influence action quality. To collect more comprehensive data, the FLEX dataset required subjects across various capability levels compared with datasets that only contained professional-level subjects. So, we extensively recruited subjects within our institution and local commercial gyms,… view at source ↗

**Figure 5.** Figure 5: The overview of the FLEX knowledge graph. (a) Visualization of frequently used annotation words. (b) FLEX-KG: the structure of the knowledge graph. (c1) Mapping between actions and action knots. (c2) Mapping between action knots and error types. 7.5 Annotator Recruitment Due to the vast volume of the FLEX dataset and stringent annotation quality requirements, we recruited 16 professional practitioners from… view at source ↗

**Figure 6.** Figure 6: Visualization of exemplary errors and scoring during one of the exercises—barbell overhead press. Several key errors were observed that could compromise form and effectiveness. First, in the preparation process, the stance was too narrow and the grip was open rather than closed. When pressing overhead, trunk swaying and excessive elbow hyperextension were noted. The barbell was lowered too quickly, while e… view at source ↗

**Figure 7.** Figure 7: (a) Sample number of 20 actions. (b) Average duration and score of 20 actions. (c) Overall [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Weight of subjects loaded in fitness actions. FLEX comprises 20 weight-loaded fitness actions evenly divided between barbells and dumbbells. For barbells, the intrinsic weight of the bar (20 kg) is included in the calculation, whereas for dumbbells, only the single-sided weight is considered. In the figure, the X-axis denotes different subjects, and the Y-axis indicates the various actions. The color inten… view at source ↗

**Figure 9.** Figure 9: The construction of FLEX-VideoQA dataset. We designed a dialogue template following the pipeline “action recognition → action standards → action evaluation → action scoring,” with all questions and reference answers automatically generated from our annotation rules and results. In particular, action-evaluation answers were pre-generated by DeepseekV3 by combining video samples, action knots, error types, a… view at source ↗

**Figure 10.** Figure 10: The result of croissant checker. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

read the original abstract

Action Quality Assessment (AQA) -- the task of quantifying how well an action is performed -- has great potential for detecting errors in gym weight training, where accurate feedback is critical to prevent injuries and maximize gains. Existing AQA datasets, however, are limited to single-view competitive sports and RGB video, lacking multimodal signals and professional assessment of fitness actions. We introduce FLEX, the first large-scale, multimodal, multiview dataset for fitness AQA that incorporates surface electromyography (sEMG). FLEX contains over 7,500 multiview recordings of 20 weight-loaded exercises performed by 38 subjects of diverse skill levels, with synchronized RGB video, 3D pose, sEMG, and physiological signals. Expert annotations are organized into a Fitness Knowledge Graph (FKG) linking actions, key steps, error types, and feedback, supporting a compositional scoring function for interpretable quality assessment. FLEX enables multimodal fusion, cross-modal prediction -- including the novel Video$\rightarrow$EMG task -- and biomechanically oriented representation learning. Building on the FKG, we further introduce FLEX-VideoQA, a structured question-answering benchmark with hierarchical queries that drive cross-modal reasoning in vision-language models. Baseline experiments demonstrate that multimodal inputs, multiview video, and fine-grained annotations significantly enhance AQA performance. FLEX thus advances AQA toward richer multimodal settings and provides a foundation for AI-powered fitness assessment and coaching. Dataset and code are available at \href{https://github.com/HaoYin116/FLEX}{https://github.com/HaoYin116/FLEX}. Link to Project \href{https://haoyin116.github.io/FLEX_Dataset}{page}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FLEX adds a useful new multimodal fitness AQA dataset with sEMG and a knowledge graph, but its performance claims need strict cross-subject checks to hold up.

read the letter

The main point is that this paper releases FLEX, a new dataset of over 7,500 multiview recordings of 20 weight training exercises from 38 subjects, with synchronized RGB, 3D pose, sEMG, and physiological signals plus expert annotations structured in a Fitness Knowledge Graph. That combination is new for the AQA literature, which has mostly stuck to single-view RGB in competitive sports. The authors also define a Video-to-EMG prediction task and a hierarchical FLEX-VideoQA benchmark, and they release the data and code publicly. Those are concrete additions that researchers working on coaching tools or multimodal action understanding can build on directly. The baselines show gains when adding modalities and multiview inputs, which is consistent with what one would expect from richer signals. The Fitness Knowledge Graph approach for compositional scoring is a reasonable way to make the annotations more structured and interpretable. The soft spot is the evaluation protocol. With only 38 subjects, any train-test overlap lets models exploit person-specific movement or EMG patterns rather than learning transferable skill or error representations. If the splits are not fully subject-disjoint, the reported lifts could shrink or disappear under leave-one-subject-out testing. The abstract does not give numbers or error bars, so the full paper needs to show the exact split details and controls. This work is aimed at computer vision groups doing human action analysis or applied work in fitness and rehab. Readers who need a public multimodal fitness dataset or want to test cross-modal prediction will get value from it. It is substantial enough as a data contribution to deserve peer review, even if the baselines require tighter validation on generalization.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces FLEX, a large-scale multimodal multiview dataset for fitness action quality assessment (AQA) consisting of over 7,500 synchronized recordings (RGB video, 3D pose, sEMG, physiological signals) of 20 weight-loaded exercises performed by 38 subjects of varying skill levels. Expert annotations are structured via a Fitness Knowledge Graph (FKG) supporting compositional scoring; the work also releases the FLEX-VideoQA benchmark and reports baseline results claiming that multimodal fusion, multiview inputs, and fine-grained FKG annotations yield significant AQA performance gains over unimodal or single-view alternatives.

Significance. If the reported baseline improvements are shown to hold under subject-disjoint evaluation protocols and the 38-subject cohort adequately samples error patterns and skill variation, FLEX would constitute a valuable addition to the AQA literature by moving beyond single-view RGB sports datasets and enabling cross-modal tasks such as Video-to-EMG prediction. The provision of the FKG and the associated VideoQA benchmark further supports interpretable, biomechanically grounded modeling.

major comments (2)

[Baseline experiments] Baseline experiments section: the manuscript does not state whether train/test splits are subject-disjoint. With only 38 subjects, any subject overlap would allow models to exploit person-specific sEMG signatures, movement idiosyncrasies, or annotation biases rather than learning transferable skill representations, directly undermining the central claim that multimodal and multiview inputs produce generalizable AQA improvements.
[Dataset construction] Dataset description and Table 1 (or equivalent subject statistics): no breakdown is provided of how the 38 subjects are distributed across skill levels, nor are inter-annotator agreement statistics or error-type coverage reported for the FKG. These omissions make it impossible to assess whether the weakest assumption—that the recordings represent the range of real-world fitness errors—holds.

minor comments (2)

[Abstract] Abstract: 'Largescale' should be hyphenated as 'Large-scale'.
[Abstract] The abstract states that baselines 'significantly enhance AQA performance' yet supplies no numerical deltas, error bars, or statistical tests; these quantitative details should appear in the abstract or be clearly cross-referenced to the results tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. These have helped us identify important clarifications needed to strengthen the presentation of our work. We respond to each major comment below and commit to revisions that directly address the concerns raised.

read point-by-point responses

Referee: [Baseline experiments] Baseline experiments section: the manuscript does not state whether train/test splits are subject-disjoint. With only 38 subjects, any subject overlap would allow models to exploit person-specific sEMG signatures, movement idiosyncrasies, or annotation biases rather than learning transferable skill representations, directly undermining the central claim that multimodal and multiview inputs produce generalizable AQA improvements.

Authors: We agree that subject-disjoint splits are essential for validating generalizable AQA improvements, especially with a modest cohort size. Our baseline experiments were conducted using subject-disjoint train/test splits to prevent leakage of person-specific patterns. This protocol was followed but not explicitly documented in the section. We will revise the baseline experiments section to clearly state that all reported results use subject-disjoint splits, provide the exact split ratios, and describe the subject partitioning procedure. revision: yes
Referee: [Dataset construction] Dataset description and Table 1 (or equivalent subject statistics): no breakdown is provided of how the 38 subjects are distributed across skill levels, nor are inter-annotator agreement statistics or error-type coverage reported for the FKG. These omissions make it impossible to assess whether the weakest assumption—that the recordings represent the range of real-world fitness errors—holds.

Authors: We thank the referee for pointing out these omissions. We will expand the dataset description and update Table 1 to include a breakdown of the 38 subjects by skill level (beginner, intermediate, advanced) as assessed by experts. We will also add inter-annotator agreement statistics (e.g., Cohen's kappa) for the FKG annotations. For error-type coverage, we will include a summary of the error categories and their frequencies in the dataset to better demonstrate representation of real-world fitness errors. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset and benchmarking paper

full rationale

This paper introduces a new multimodal fitness AQA dataset (FLEX) with recordings, sEMG, 3D pose, and Fitness Knowledge Graph annotations from 38 subjects, then reports baseline experiments on multimodal fusion and VideoQA. No mathematical derivations, equations, or predictions are present that could reduce to fitted parameters or self-defined quantities by construction. The central claims rest on data collection and empirical performance lifts, which are independent of any internal definitions or self-citation chains. No load-bearing steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central contribution is empirical data collection and a new annotation graph rather than new mathematical axioms, free parameters, or derivations from prior literature.

invented entities (1)

Fitness Knowledge Graph (FKG) no independent evidence
purpose: Organize expert annotations linking actions, key steps, error types, and feedback to enable compositional and interpretable quality scoring.
New structure introduced to support structured representation learning and the VideoQA benchmark.

pith-pipeline@v0.9.0 · 5866 in / 1212 out tokens · 60069 ms · 2026-05-19T11:36:30.904008+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Baseline experiments demonstrate that multimodal inputs, multiview video, and fine-grained annotations significantly enhance AQA performance.
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FLEX contains over 7,500 multiview recordings of 20 weight-loaded exercises performed by 38 subjects

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos
cs.CV 2026-04 unverdicted novelty 7.0

ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Assessing the quality of actions

Hamed Pirsiavash, Carl V ondrick, and Antonio Torralba. Assessing the quality of actions. In European Conference on Computer Vision, pages 556–571. Springer, 2014. 3, 4, 5, 19

work page 2014
[2]

Learning to score olympic events

Paritosh Parmar and Brendan Tran Morris. Learning to score olympic events. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 20–28,

work page
[3]

Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion model- ing

Yixin Gao, S Swaroop Vedula, Carol E Reiley, Narges Ahmidi, Balakrishnan Varadarajan, Henry C Lin, Lingling Tao, Luca Zappella, Benjamın Béjar, and David D Yuh. Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion model- ing. In Medical Image Computing and Computer Assisted Intervention Workshop, volume ...

work page 2014
[4]

A data set of human body movements for physical rehabilitation exercises

Aleksandar Vakanski, Hyung-pil Jun, David Paul, and Russell Baker. A data set of human body movements for physical rehabilitation exercises. Data, 3(1):2, 2018. 3

work page 2018
[5]

The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation

Marianna Capecci, Maria Gabriella Ceravolo, Francesco Ferracuti, Sabrina Iarlori, Andrea Monteriu, Luca Romeo, and Federica Verdini. The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7):1436–1448, 2019. 3, 5

work page 2019
[6]

Domain knowledge-informed self-supervised representations for workout form assessment

Paritosh Parmar, Amol Gharat, and Helge Rhodin. Domain knowledge-informed self-supervised representations for workout form assessment. In European Conference on Computer Vision, pages 105–123. Springer, 2022. 3, 4, 15

work page 2022
[7]

Egoexo-fitness: Towards egocentric and exocentric full-body action understanding

Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, and Wei-Shi Zheng. Egoexo-fitness: Towards egocentric and exocentric full-body action understanding. In European Conference on Computer Vision, 2024. 3, 4, 5, 15, 16

work page 2024
[8]

Temporal distance matrices for squat classification

Ryoji Ogata, Edgar Simo-Serra, Satoshi Iizuka, and Hiroshi Ishikawa. Temporal distance matrices for squat classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 3, 4, 15

work page 2019
[9]

Assembly101: A large-scale multi-view video dataset for understanding procedural activities

Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, and Angela Yao. Assembly101: A large-scale multi-view video dataset for understanding procedural activities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21096–21106, 2022. 3

work page 2022
[10]

Gaia: Rethinking action quality assessment for ai-generated videos

Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, and Wenjun Zhang. Gaia: Rethinking action quality assessment for ai-generated videos. In Advances in Neural Information Processing Systems, 2024. 3

work page 2024
[11]

Action quality assessment across multiple actions

Paritosh Parmar and Brendan Morris. Action quality assessment across multiple actions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1468–1476. IEEE, 2019. 3

work page 2019
[12]

What and how well you performed? a multitask learning approach to action quality assessment

Paritosh Parmar and Brendan Tran Morris. What and how well you performed? a multitask learning approach to action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 304–313, 2019. 3, 5

work page 2019
[13]

Learning to score figure skating sport videos

Chengming Xu, Yanwei Fu, Bing Zhang, Zitian Chen, Yu-Gang Jiang, and Xiangyang Xue. Learning to score figure skating sport videos. IEEE Transactions on Circuits and Systems for Video Technology, 30(12):4578–4590, 2019. 3

work page 2019
[14]

An asymmetric modeling for action assessment

Jibin Gao, Wei-Shi Zheng, Jia-Hui Pan, Chengying Gao, Yaowei Wang, Wei Zeng, and Jian- huang Lai. An asymmetric modeling for action assessment. In European Conference on Computer Vision, pages 222–238. Springer, 2020. 3

work page 2020
[15]

Hybrid dynamic-static context-aware attention network for action assessment in long videos

Ling-An Zeng, Fa-Ting Hong, Wei-Shi Zheng, Qi-Zhi Yu, Wei Zeng, Yao-Wei Wang, and Jian-Huang Lai. Hybrid dynamic-static context-aware attention network for action assessment in long videos. In Proceedings of the ACM International Conference on Multimedia, pages 2526–2534, 2020. 3 10

work page 2020
[16]

Tsa-net: Tube self-attention network for action quality assessment

Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, and Lihua Zhang. Tsa-net: Tube self-attention network for action quality assessment. In Proceedings of the ACM International Conference on Multimedia, pages 4902–4910, 2021. 3

work page 2021
[17]

Finediving: A fine-grained dataset for procedure-aware action quality assessment

Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, and Jiwen Lu. Finediving: A fine-grained dataset for procedure-aware action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2949–2958, 2022. 3, 4

work page 2022
[18]

Logo: A long-form video dataset for group action quality assessment

Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, and Yansong Tang. Logo: A long-form video dataset for group action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2405–2414, 2023. 3

work page 2023
[19]

Localization- assisted uncertainty score disentanglement network for action quality assessment

Yanli Ji, Lingfeng Ye, Huili Huang, Lijing Mao, Yang Zhou, and Lingling Gao. Localization- assisted uncertainty score disentanglement network for action quality assessment. In Proceed- ings of the ACM International Conference on Multimedia, pages 8590–8597, 2023. 3

work page 2023
[20]

Automatic modelling for interactive action assessment

Jibin Gao, Jia-Hui Pan, Shao-Jie Zhang, and Wei-Shi Zheng. Automatic modelling for interactive action assessment. International Journal of Computer Vision, 131(3):659–679, 2023. 3

work page 2023
[21]

Lucidaction: A hierarchical and multi- model dataset for comprehensive action quality assessment

Linfeng Dong, Wei Wang, Yu Qiao, and Xiao Sun. Lucidaction: A hierarchical and multi- model dataset for comprehensive action quality assessment. In Advances in Neural Information Processing Systems, 2024. 3, 4

work page 2024
[22]

Current developments in surface electromyography

Veysel ALCAN and Murat Z˙INNURO ˘GLU. Current developments in surface electromyography. Turkish Journal of Medical Sciences, 53(5):1019–1031, 2023. 3

work page 2023
[23]

Extracting time-frequency feature of single-channel vastus medialis emg signals for knee exercise pattern recognition

Yi Zhang, Peiyang Li, Xuyang Zhu, Steven W Su, Qing Guo, Peng Xu, and Dezhong Yao. Extracting time-frequency feature of single-channel vastus medialis emg signals for knee exercise pattern recognition. PloS one, 12(7):e0180526, 2017. 3

work page 2017
[24]

Individuals have unique muscle activation signatures as revealed during gait and pedaling

François Hug, Clément V ogel, Kylie Tucker, Sylvain Dorel, Thibault Deschamps, Éric Le Car- pentier, and Lilian Lacourpaille. Individuals have unique muscle activation signatures as revealed during gait and pedaling. Journal of Applied Physiology, 127(4):1165–1174, 2019. 3

work page 2019
[25]

Muscle activation patterns are more constrained and regular in treadmill than in overground human locomotion

Ilaria Mileti, Aurora Serra, Nerses Wolf, Victor Munoz-Martel, Antonis Ekizos, Eduardo Palermo, Adamantios Arampatzis, and Alessandro Santuz. Muscle activation patterns are more constrained and regular in treadmill than in overground human locomotion. Frontiers in Bioengineering and Biotechnology, 8:581619, 2020. 3

work page 2020
[26]

A large calibrated database of hand movements and grasps kinematics

Néstor J Jarque-Bou, Manfredo Atzori, and Henning Müller. A large calibrated database of hand movements and grasps kinematics. Scientific data, 7(1):12, 2020. 3

work page 2020
[27]

Sex-specific tuning of modular muscle activation patterns for locomotion in young and older adults

Alessandro Santuz, Lars Janshen, Leon Brüll, Victor Munoz-Martel, Juri Taborri, Stefano Rossi, and Adamantios Arampatzis. Sex-specific tuning of modular muscle activation patterns for locomotion in young and older adults. PLoS One, 17(6):e0269417, 2022. 3

work page 2022
[28]

semg dataset of routine activities

Asad Mansoor Khan, Sajid Gul Khawaja, Muhammad Usman Akram, and Ali Saeed Khan. semg dataset of routine activities. Data in brief, 33:106543, 2020. 3

work page 2020
[29]

Hristo Dimitrov, Anthony M. J. Bull, and Dario Farina. High-density EMG, IMU, kinetic, and kinematic open-source data for comprehensive locomotion activities. Scientific Data, 10(1):1–10, 2023. 3

work page 2023
[30]

A comparison of neural control of the biarticular gastrocnemius muscles between knee flexion and ankle plantar flexion

Raphaël Hamard, Jeroen Aeles, Simon Avrillon, Taylor JM Dick, and François Hug. A comparison of neural control of the biarticular gastrocnemius muscles between knee flexion and ankle plantar flexion. Journal of Applied Physiology, 135(2):394–404, 2023. 3

work page 2023
[31]

A wearable real-time kinetic measurement sensor setup for human locomotion

Huawei Wang, Akash Basu, Guillaume Durandau, and Massimo Sartori. A wearable real-time kinetic measurement sensor setup for human locomotion. Wearable technologies, 4:e11, 2023. 3 11

work page 2023
[32]

Electromyo- graphy data for non-invasive naturally-controlled robotic hand prostheses

Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto, and Henning Müller. Electromyo- graphy data for non-invasive naturally-controlled robotic hand prostheses. Scientific data, 1(1):1–13, 2014. 3

work page 2014
[33]

Neuropose: 3d hand pose tracking using emg wearables

Yilin Liu, Shijia Zhang, and Mahanth Gowda. Neuropose: 3d hand pose tracking using emg wearables. In Proceedings of the Web Conference, pages 1471–1482, 2021. 3

work page 2021
[34]

Sensing the full dynamics of the human hand with a neural interface and deep learning

Raul C Sîmpetru, Andreas Arkudas, Dominik I Braun, Marius Osswald, Daniela Souza de Oliveira, Bjoern Eskofier, Thomas M Kinfe, and Alessandro Del Vecchio. Sensing the full dynamics of the human hand with a neural interface and deep learning. BioRxiv, pages 2022–07, 2022. 3

work page 2022
[35]

Dataset for multi- channel surface electromyography (semg) signals of hand gestures

Mehmet Akif Ozdemir, Deniz Hande Kisa, Onan Guren, and Aydin Akan. Dataset for multi- channel surface electromyography (semg) signals of hand gestures. Data in brief, 41:107921,

work page
[36]

emg2pose: A large and diverse benchmark for surface electromyographic hand pose estimation

Sasha Salter, Richard Warren, Collin Schlager, Adrian Spurr, Shangchen Han, Rohin Bhasin, Yujun Cai, Peter Walkington, Anuoluwapo Bolarinwa, Robert J Wang, et al. emg2pose: A large and diverse benchmark for surface electromyographic hand pose estimation. Advances in Neural Information Processing Systems, 37:55703–55728, 2024. 3, 9

work page 2024
[37]

Fastmove wireless emg, 2024

FASTMOVE. Fastmove wireless emg, 2024. 3, 5

work page 2024
[38]

Fastmove 3d motion for realtime, 2024

FASTMOVE. Fastmove 3d motion for realtime, 2024. 4

work page 2024
[39]

Zcam e2-m4, 2020

ZCAM. Zcam e2-m4, 2020. 4

work page 2020
[40]

M.zuiko digital ed 14-150mm f4.0-5.6, 2022

OLYMPUS. M.zuiko digital ed 14-150mm f4.0-5.6, 2022. 5

work page 2022
[41]

Oneplus 7, 2019

OnePlus. Oneplus 7, 2019. 5

work page 2019
[42]

Piano skills assessment

Paritosh Parmar, Jaiden Reddy, and Brendan Morris. Piano skills assessment. In IEEE Interna- tional Workshop on Multimedia Signal Processing, pages 1–5. IEEE, 2021. 5

work page 2021
[43]

Flag3d: A 3d fitness activity dataset with language instruction

Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie Zhou, and Xiu Li. Flag3d: A 3d fitness activity dataset with language instruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22106–22117, 2023. 5

work page 2023
[44]

National occupational skill standard — social sports instructor (occupational code: 4-13-04-01)

Ministry of Human Resources and Social Security of the People’s Republic of China and General Administration of Sport of China. National occupational skill standard — social sports instructor (occupational code: 4-13-04-01). Standard, Ministry of Human Resources and Social Security of the People’s Republic of China and General Administration of Sport of C...

work page 2020
[45]

Occupational Competency Training Textbook for Social Sports Instructors—Fitness Coaches (with Technical Action Videos)

Human Resources Development Center of the General Administration of Sport of China. Occupational Competency Training Textbook for Social Sports Instructors—Fitness Coaches (with Technical Action Videos). Higher Education Press, 2023. 6

work page 2023
[46]

Fitness and Bodybuilding Tutorial

Beijing Sport University. Fitness and Bodybuilding Tutorial. Beijing Sport University Press,

work page
[47]

Joe Weider’s Bodybuilding System

Joe Weider. Joe Weider’s Bodybuilding System. Weider Pubns, 1998. 6

work page 1998
[48]

Group-aware contrastive regression for action quality assessment

Xumin Yu, Yongming Rao, Wenliang Zhao, Jiwen Lu, and Jie Zhou. Group-aware contrastive regression for action quality assessment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7919–7928, 2021. 7, 8, 19, 20

work page 2021
[49]

Spatial temporal graph convolutional networks for skeleton-based action recognition

Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. 8 12

work page 2018
[50]

Ntu rgb+ d: A large scale dataset for 3d human activity analysis

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1010–1019, 2016. 8

work page 2016
[51]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report. arXiv preprint arXiv:2502.13923,

work page internal anchor Pith review Pith/arXiv arXiv
[52]

A decade of action quality assessment: Largest systematic survey of trends, challenges, and future directions

Hao Yin, Paritosh Parmar, Daoliang Xu, Yang Zhang, Tianyou Zheng, and Weiwei Fu. A decade of action quality assessment: Largest systematic survey of trends, challenges, and future directions. arXiv, 2025. 15

work page 2025
[53]

Quo vadis, action recognition? a new model and the kinetics dataset

Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017. 19

work page 2017
[54]

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. Llamafactory: Unified efficient fine-tuning of 100+ language models. arXiv preprint arXiv:2403.13372, 2024. 20 13 Checklist

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

(a) Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes] Please refer to section 1

For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes] Please refer to section 1. (b) Did you describe the limitations of your work? [Yes] Please refer to section 5. (c) Did you discuss any potential negative societal impacts of your work? [Yes] Please refer to the Appe...

work page
[56]

(a) Did you state the full set of assumptions of all theoretical results? [NA] (b) Did you include complete proofs of all theoretical results? [NA]

If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [NA] (b) Did you include complete proofs of all theoretical results? [NA]

work page
[57]

for benchmarks)

If you ran experiments (e.g. for benchmarks)... (a) Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [Yes] Please refer to the Appendix. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please...

work page
[58]

(a) If your work uses existing assets, did you cite the creators? [Yes] Please refer to section 4

If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] Please refer to section 4. (b) Did you mention the license of the assets? [Yes] Please refer to the Appendix. (c) Did you include any new assets either in the supplemental material or as a ...

work page
[59]

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [Yes] Please refer to the Appendix

If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [Yes] Please refer to the Appendix. (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [Yes] Please refer ...

work page
[60]

Push-ups 1

Kneeling Push-ups 2. Push-ups 1. Pectoralis major 2. Anterior deltoid3. Kneeling Torso Twist 4. Knee Raise + Abs Contract3. Triceps brachii 4. External obliques5. Shoulder Bridge 6. Sit-ups 5. Internal obliques 6. Rectus abdominis7. Leg Reverse Lunge 8. Leg Lunge with Knee Lift7. Iliopsoas 8. Gluteus maximus9. Sumo Squat 10. Jumping Jacks 9. Hamstrings 10...

work page
[61]

Barbell Bicep Curl 1

Standing Barbell Overhead Press2. Barbell Bicep Curl 1. Pectoralis major 2. Anterior deltoid3. Barbell Upright Row 4. Dumbbell Front Raise 3. Middle deltoid 4. Posterior deltoid5. Dumbbell Bicep Curl 6. Dumbbell Lateral Raise 5. Triceps brachii 6. Biceps brachii7. Bent-Over Dumbbell Reverse Fly8. Flat Barbell Bench Press7. Brachialis 8. Supraspinatus9. In...

work page arXiv
[62]

action recognition → action standards → action evaluation → action scoring,

We also report the top 20 most frequent error types. Additionally, we provide the weight loads each subject uses for each action Figure 8. A01A02A03A04A05A06A07A08A09A10A11A12A13A14A15A16A17A18A19A20 350 355 360 365 370 375 380Sample Number Action 150 200 250 300 350 400 A01A02A03A04A05A06A07A08A09A10A11A12A13A14A15A16A17A18A19A20 40 50 60 70 80 DurationS...

work page 2000

[1] [1]

Assessing the quality of actions

Hamed Pirsiavash, Carl V ondrick, and Antonio Torralba. Assessing the quality of actions. In European Conference on Computer Vision, pages 556–571. Springer, 2014. 3, 4, 5, 19

work page 2014

[2] [2]

Learning to score olympic events

Paritosh Parmar and Brendan Tran Morris. Learning to score olympic events. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 20–28,

work page

[3] [3]

Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion model- ing

Yixin Gao, S Swaroop Vedula, Carol E Reiley, Narges Ahmidi, Balakrishnan Varadarajan, Henry C Lin, Lingling Tao, Luca Zappella, Benjamın Béjar, and David D Yuh. Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion model- ing. In Medical Image Computing and Computer Assisted Intervention Workshop, volume ...

work page 2014

[4] [4]

A data set of human body movements for physical rehabilitation exercises

Aleksandar Vakanski, Hyung-pil Jun, David Paul, and Russell Baker. A data set of human body movements for physical rehabilitation exercises. Data, 3(1):2, 2018. 3

work page 2018

[5] [5]

The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation

Marianna Capecci, Maria Gabriella Ceravolo, Francesco Ferracuti, Sabrina Iarlori, Andrea Monteriu, Luca Romeo, and Federica Verdini. The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7):1436–1448, 2019. 3, 5

work page 2019

[6] [6]

Domain knowledge-informed self-supervised representations for workout form assessment

Paritosh Parmar, Amol Gharat, and Helge Rhodin. Domain knowledge-informed self-supervised representations for workout form assessment. In European Conference on Computer Vision, pages 105–123. Springer, 2022. 3, 4, 15

work page 2022

[7] [7]

Egoexo-fitness: Towards egocentric and exocentric full-body action understanding

Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, and Wei-Shi Zheng. Egoexo-fitness: Towards egocentric and exocentric full-body action understanding. In European Conference on Computer Vision, 2024. 3, 4, 5, 15, 16

work page 2024

[8] [8]

Temporal distance matrices for squat classification

Ryoji Ogata, Edgar Simo-Serra, Satoshi Iizuka, and Hiroshi Ishikawa. Temporal distance matrices for squat classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 3, 4, 15

work page 2019

[9] [9]

Assembly101: A large-scale multi-view video dataset for understanding procedural activities

Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, and Angela Yao. Assembly101: A large-scale multi-view video dataset for understanding procedural activities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21096–21106, 2022. 3

work page 2022

[10] [10]

Gaia: Rethinking action quality assessment for ai-generated videos

Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, and Wenjun Zhang. Gaia: Rethinking action quality assessment for ai-generated videos. In Advances in Neural Information Processing Systems, 2024. 3

work page 2024

[11] [11]

Action quality assessment across multiple actions

Paritosh Parmar and Brendan Morris. Action quality assessment across multiple actions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1468–1476. IEEE, 2019. 3

work page 2019

[12] [12]

What and how well you performed? a multitask learning approach to action quality assessment

Paritosh Parmar and Brendan Tran Morris. What and how well you performed? a multitask learning approach to action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 304–313, 2019. 3, 5

work page 2019

[13] [13]

Learning to score figure skating sport videos

Chengming Xu, Yanwei Fu, Bing Zhang, Zitian Chen, Yu-Gang Jiang, and Xiangyang Xue. Learning to score figure skating sport videos. IEEE Transactions on Circuits and Systems for Video Technology, 30(12):4578–4590, 2019. 3

work page 2019

[14] [14]

An asymmetric modeling for action assessment

Jibin Gao, Wei-Shi Zheng, Jia-Hui Pan, Chengying Gao, Yaowei Wang, Wei Zeng, and Jian- huang Lai. An asymmetric modeling for action assessment. In European Conference on Computer Vision, pages 222–238. Springer, 2020. 3

work page 2020

[15] [15]

Hybrid dynamic-static context-aware attention network for action assessment in long videos

Ling-An Zeng, Fa-Ting Hong, Wei-Shi Zheng, Qi-Zhi Yu, Wei Zeng, Yao-Wei Wang, and Jian-Huang Lai. Hybrid dynamic-static context-aware attention network for action assessment in long videos. In Proceedings of the ACM International Conference on Multimedia, pages 2526–2534, 2020. 3 10

work page 2020

[16] [16]

Tsa-net: Tube self-attention network for action quality assessment

Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, and Lihua Zhang. Tsa-net: Tube self-attention network for action quality assessment. In Proceedings of the ACM International Conference on Multimedia, pages 4902–4910, 2021. 3

work page 2021

[17] [17]

Finediving: A fine-grained dataset for procedure-aware action quality assessment

Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, and Jiwen Lu. Finediving: A fine-grained dataset for procedure-aware action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2949–2958, 2022. 3, 4

work page 2022

[18] [18]

Logo: A long-form video dataset for group action quality assessment

Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, and Yansong Tang. Logo: A long-form video dataset for group action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2405–2414, 2023. 3

work page 2023

[19] [19]

Localization- assisted uncertainty score disentanglement network for action quality assessment

Yanli Ji, Lingfeng Ye, Huili Huang, Lijing Mao, Yang Zhou, and Lingling Gao. Localization- assisted uncertainty score disentanglement network for action quality assessment. In Proceed- ings of the ACM International Conference on Multimedia, pages 8590–8597, 2023. 3

work page 2023

[20] [20]

Automatic modelling for interactive action assessment

Jibin Gao, Jia-Hui Pan, Shao-Jie Zhang, and Wei-Shi Zheng. Automatic modelling for interactive action assessment. International Journal of Computer Vision, 131(3):659–679, 2023. 3

work page 2023

[21] [21]

Lucidaction: A hierarchical and multi- model dataset for comprehensive action quality assessment

Linfeng Dong, Wei Wang, Yu Qiao, and Xiao Sun. Lucidaction: A hierarchical and multi- model dataset for comprehensive action quality assessment. In Advances in Neural Information Processing Systems, 2024. 3, 4

work page 2024

[22] [22]

Current developments in surface electromyography

Veysel ALCAN and Murat Z˙INNURO ˘GLU. Current developments in surface electromyography. Turkish Journal of Medical Sciences, 53(5):1019–1031, 2023. 3

work page 2023

[23] [23]

Extracting time-frequency feature of single-channel vastus medialis emg signals for knee exercise pattern recognition

Yi Zhang, Peiyang Li, Xuyang Zhu, Steven W Su, Qing Guo, Peng Xu, and Dezhong Yao. Extracting time-frequency feature of single-channel vastus medialis emg signals for knee exercise pattern recognition. PloS one, 12(7):e0180526, 2017. 3

work page 2017

[24] [24]

Individuals have unique muscle activation signatures as revealed during gait and pedaling

François Hug, Clément V ogel, Kylie Tucker, Sylvain Dorel, Thibault Deschamps, Éric Le Car- pentier, and Lilian Lacourpaille. Individuals have unique muscle activation signatures as revealed during gait and pedaling. Journal of Applied Physiology, 127(4):1165–1174, 2019. 3

work page 2019

[25] [25]

Muscle activation patterns are more constrained and regular in treadmill than in overground human locomotion

Ilaria Mileti, Aurora Serra, Nerses Wolf, Victor Munoz-Martel, Antonis Ekizos, Eduardo Palermo, Adamantios Arampatzis, and Alessandro Santuz. Muscle activation patterns are more constrained and regular in treadmill than in overground human locomotion. Frontiers in Bioengineering and Biotechnology, 8:581619, 2020. 3

work page 2020

[26] [26]

A large calibrated database of hand movements and grasps kinematics

Néstor J Jarque-Bou, Manfredo Atzori, and Henning Müller. A large calibrated database of hand movements and grasps kinematics. Scientific data, 7(1):12, 2020. 3

work page 2020

[27] [27]

Sex-specific tuning of modular muscle activation patterns for locomotion in young and older adults

Alessandro Santuz, Lars Janshen, Leon Brüll, Victor Munoz-Martel, Juri Taborri, Stefano Rossi, and Adamantios Arampatzis. Sex-specific tuning of modular muscle activation patterns for locomotion in young and older adults. PLoS One, 17(6):e0269417, 2022. 3

work page 2022

[28] [28]

semg dataset of routine activities

Asad Mansoor Khan, Sajid Gul Khawaja, Muhammad Usman Akram, and Ali Saeed Khan. semg dataset of routine activities. Data in brief, 33:106543, 2020. 3

work page 2020

[29] [29]

Hristo Dimitrov, Anthony M. J. Bull, and Dario Farina. High-density EMG, IMU, kinetic, and kinematic open-source data for comprehensive locomotion activities. Scientific Data, 10(1):1–10, 2023. 3

work page 2023

[30] [30]

A comparison of neural control of the biarticular gastrocnemius muscles between knee flexion and ankle plantar flexion

Raphaël Hamard, Jeroen Aeles, Simon Avrillon, Taylor JM Dick, and François Hug. A comparison of neural control of the biarticular gastrocnemius muscles between knee flexion and ankle plantar flexion. Journal of Applied Physiology, 135(2):394–404, 2023. 3

work page 2023

[31] [31]

A wearable real-time kinetic measurement sensor setup for human locomotion

Huawei Wang, Akash Basu, Guillaume Durandau, and Massimo Sartori. A wearable real-time kinetic measurement sensor setup for human locomotion. Wearable technologies, 4:e11, 2023. 3 11

work page 2023

[32] [32]

Electromyo- graphy data for non-invasive naturally-controlled robotic hand prostheses

Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto, and Henning Müller. Electromyo- graphy data for non-invasive naturally-controlled robotic hand prostheses. Scientific data, 1(1):1–13, 2014. 3

work page 2014

[33] [33]

Neuropose: 3d hand pose tracking using emg wearables

Yilin Liu, Shijia Zhang, and Mahanth Gowda. Neuropose: 3d hand pose tracking using emg wearables. In Proceedings of the Web Conference, pages 1471–1482, 2021. 3

work page 2021

[34] [34]

Sensing the full dynamics of the human hand with a neural interface and deep learning

Raul C Sîmpetru, Andreas Arkudas, Dominik I Braun, Marius Osswald, Daniela Souza de Oliveira, Bjoern Eskofier, Thomas M Kinfe, and Alessandro Del Vecchio. Sensing the full dynamics of the human hand with a neural interface and deep learning. BioRxiv, pages 2022–07, 2022. 3

work page 2022

[35] [35]

Dataset for multi- channel surface electromyography (semg) signals of hand gestures

Mehmet Akif Ozdemir, Deniz Hande Kisa, Onan Guren, and Aydin Akan. Dataset for multi- channel surface electromyography (semg) signals of hand gestures. Data in brief, 41:107921,

work page

[36] [36]

emg2pose: A large and diverse benchmark for surface electromyographic hand pose estimation

Sasha Salter, Richard Warren, Collin Schlager, Adrian Spurr, Shangchen Han, Rohin Bhasin, Yujun Cai, Peter Walkington, Anuoluwapo Bolarinwa, Robert J Wang, et al. emg2pose: A large and diverse benchmark for surface electromyographic hand pose estimation. Advances in Neural Information Processing Systems, 37:55703–55728, 2024. 3, 9

work page 2024

[37] [37]

Fastmove wireless emg, 2024

FASTMOVE. Fastmove wireless emg, 2024. 3, 5

work page 2024

[38] [38]

Fastmove 3d motion for realtime, 2024

FASTMOVE. Fastmove 3d motion for realtime, 2024. 4

work page 2024

[39] [39]

Zcam e2-m4, 2020

ZCAM. Zcam e2-m4, 2020. 4

work page 2020

[40] [40]

M.zuiko digital ed 14-150mm f4.0-5.6, 2022

OLYMPUS. M.zuiko digital ed 14-150mm f4.0-5.6, 2022. 5

work page 2022

[41] [41]

Oneplus 7, 2019

OnePlus. Oneplus 7, 2019. 5

work page 2019

[42] [42]

Piano skills assessment

Paritosh Parmar, Jaiden Reddy, and Brendan Morris. Piano skills assessment. In IEEE Interna- tional Workshop on Multimedia Signal Processing, pages 1–5. IEEE, 2021. 5

work page 2021

[43] [43]

Flag3d: A 3d fitness activity dataset with language instruction

Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie Zhou, and Xiu Li. Flag3d: A 3d fitness activity dataset with language instruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22106–22117, 2023. 5

work page 2023

[44] [44]

National occupational skill standard — social sports instructor (occupational code: 4-13-04-01)

Ministry of Human Resources and Social Security of the People’s Republic of China and General Administration of Sport of China. National occupational skill standard — social sports instructor (occupational code: 4-13-04-01). Standard, Ministry of Human Resources and Social Security of the People’s Republic of China and General Administration of Sport of C...

work page 2020

[45] [45]

Occupational Competency Training Textbook for Social Sports Instructors—Fitness Coaches (with Technical Action Videos)

Human Resources Development Center of the General Administration of Sport of China. Occupational Competency Training Textbook for Social Sports Instructors—Fitness Coaches (with Technical Action Videos). Higher Education Press, 2023. 6

work page 2023

[46] [46]

Fitness and Bodybuilding Tutorial

Beijing Sport University. Fitness and Bodybuilding Tutorial. Beijing Sport University Press,

work page

[47] [47]

Joe Weider’s Bodybuilding System

Joe Weider. Joe Weider’s Bodybuilding System. Weider Pubns, 1998. 6

work page 1998

[48] [48]

Group-aware contrastive regression for action quality assessment

Xumin Yu, Yongming Rao, Wenliang Zhao, Jiwen Lu, and Jie Zhou. Group-aware contrastive regression for action quality assessment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7919–7928, 2021. 7, 8, 19, 20

work page 2021

[49] [49]

Spatial temporal graph convolutional networks for skeleton-based action recognition

Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018. 8 12

work page 2018

[50] [50]

Ntu rgb+ d: A large scale dataset for 3d human activity analysis

Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1010–1019, 2016. 8

work page 2016

[51] [51]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report. arXiv preprint arXiv:2502.13923,

work page internal anchor Pith review Pith/arXiv arXiv

[52] [52]

A decade of action quality assessment: Largest systematic survey of trends, challenges, and future directions

Hao Yin, Paritosh Parmar, Daoliang Xu, Yang Zhang, Tianyou Zheng, and Weiwei Fu. A decade of action quality assessment: Largest systematic survey of trends, challenges, and future directions. arXiv, 2025. 15

work page 2025

[53] [53]

Quo vadis, action recognition? a new model and the kinetics dataset

Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017. 19

work page 2017

[54] [54]

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. Llamafactory: Unified efficient fine-tuning of 100+ language models. arXiv preprint arXiv:2403.13372, 2024. 20 13 Checklist

work page internal anchor Pith review Pith/arXiv arXiv 2024

[55] [55]

(a) Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes] Please refer to section 1

For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope? [Yes] Please refer to section 1. (b) Did you describe the limitations of your work? [Yes] Please refer to section 5. (c) Did you discuss any potential negative societal impacts of your work? [Yes] Please refer to the Appe...

work page

[56] [56]

(a) Did you state the full set of assumptions of all theoretical results? [NA] (b) Did you include complete proofs of all theoretical results? [NA]

If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [NA] (b) Did you include complete proofs of all theoretical results? [NA]

work page

[57] [57]

for benchmarks)

If you ran experiments (e.g. for benchmarks)... (a) Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [Yes] Please refer to the Appendix. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please...

work page

[58] [58]

(a) If your work uses existing assets, did you cite the creators? [Yes] Please refer to section 4

If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] Please refer to section 4. (b) Did you mention the license of the assets? [Yes] Please refer to the Appendix. (c) Did you include any new assets either in the supplemental material or as a ...

work page

[59] [59]

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [Yes] Please refer to the Appendix

If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [Yes] Please refer to the Appendix. (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [Yes] Please refer ...

work page

[60] [60]

Push-ups 1

Kneeling Push-ups 2. Push-ups 1. Pectoralis major 2. Anterior deltoid3. Kneeling Torso Twist 4. Knee Raise + Abs Contract3. Triceps brachii 4. External obliques5. Shoulder Bridge 6. Sit-ups 5. Internal obliques 6. Rectus abdominis7. Leg Reverse Lunge 8. Leg Lunge with Knee Lift7. Iliopsoas 8. Gluteus maximus9. Sumo Squat 10. Jumping Jacks 9. Hamstrings 10...

work page

[61] [61]

Barbell Bicep Curl 1

Standing Barbell Overhead Press2. Barbell Bicep Curl 1. Pectoralis major 2. Anterior deltoid3. Barbell Upright Row 4. Dumbbell Front Raise 3. Middle deltoid 4. Posterior deltoid5. Dumbbell Bicep Curl 6. Dumbbell Lateral Raise 5. Triceps brachii 6. Biceps brachii7. Bent-Over Dumbbell Reverse Fly8. Flat Barbell Bench Press7. Brachialis 8. Supraspinatus9. In...

work page arXiv

[62] [62]

action recognition → action standards → action evaluation → action scoring,

We also report the top 20 most frequent error types. Additionally, we provide the weight loads each subject uses for each action Figure 8. A01A02A03A04A05A06A07A08A09A10A11A12A13A14A15A16A17A18A19A20 350 355 360 365 370 375 380Sample Number Action 150 200 250 300 350 400 A01A02A03A04A05A06A07A08A09A10A11A12A13A14A15A16A17A18A19A20 40 50 60 70 80 DurationS...

work page 2000