A Comprehensive Survey of Action Quality Assessment: Method and Benchmark

Hubert P. H. Shum; Kanglei Zhou; Liyuan Wang; Ruizhi Cai; Xiaohui Liang

arxiv: 2412.11149 · v2 · pith:CDUTTEICnew · submitted 2024-12-15 · 💻 cs.CV

A Comprehensive Survey of Action Quality Assessment: Method and Benchmark

Kanglei Zhou , Ruizhi Cai , Liyuan Wang , Hubert P. H. Shum , Xiaohui Liang This is my paper

Pith reviewed 2026-05-23 06:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords action quality assessmentsurveytaxonomybenchmarkvideo-based methodsskeleton-based methodsmulti-modal approachescomputer vision

0 comments

The pith

A modality-driven hierarchical taxonomy organizes AQA methods by input type while a unified benchmark standardizes comparisons for video-based approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the problem that AQA research uses varied datasets and evaluation settings that prevent direct method comparisons. It introduces a modality-driven hierarchical taxonomy that places methods into video-based, skeleton-based, and multi-modal groups and traces how representative models have developed within those groups. The work also builds a unified benchmark that merges several datasets under shared protocols to measure both accuracy and computational efficiency for video-based methods. Readers care because AQA supports practical uses in sports analysis, skill training, and healthcare where reliable scoring matters. The survey ends by mapping current trends, open challenges, and possible research paths.

Core claim

Existing AQA studies rely on heterogeneous datasets and evaluation settings that make systematic comparisons across methods difficult. The survey proposes a modality-driven hierarchical taxonomy that organizes methods into video-based, skeleton-based, and multi-modal approaches and analyzes the evolution of representative models. It further establishes a unified benchmark that integrates diverse datasets and applies standardized evaluation protocols to representative video-based AQA methods, allowing consistent comparison on accuracy and computational efficiency. The paper then examines emerging trends, identifies key challenges, and outlines future directions from near-term methodological进步

What carries the argument

The modality-driven hierarchical taxonomy that classifies AQA methods according to input modality, together with the unified benchmark that combines datasets and protocols for video-based methods.

If this is right

Video-based AQA methods can be compared directly on both accuracy and efficiency using the shared protocols.
Methodological changes across video, skeleton, and multi-modal categories become easier to track over time.
Key challenges in current AQA work are listed for focused attention in follow-on studies.
Future research directions are separated into near-term modeling advances and longer-term uses of new AI paradigms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could be updated to include emerging input types such as depth or wearable sensor streams.
The benchmark protocols might be reused or adapted to create similar standardized tests for skeleton-based or multi-modal methods.
Efficiency results from the benchmark could inform deployment choices in real-time applications like coaching tools.
The organization of methods by modality may reveal under-explored combinations that future work could test.

Load-bearing premise

The chosen representative methods and datasets, when placed under standardized protocols, still support valid cross-method comparisons despite differences among the original datasets.

What would settle it

Re-running the benchmark protocols on a fresh collection of datasets or with alternate evaluation metrics produces substantially reordered accuracy or efficiency rankings among the same methods.

Figures

Figures reproduced from arXiv: 2412.11149 by Hubert P. H. Shum, Kanglei Zhou, Liyuan Wang, Ruizhi Cai, Xiaohui Liang.

**Figure 2.** Figure 2: The overall structure of our comprehensive survey. Our survey presents three core contributions: a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the common AQA framework, consisting of three [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Three typical fine-grained reasoning approaches in AQA. (a) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: The procedural nature of actions in fine temporal modeling for [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: Illustration of typical multi-modal AQA methods. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Computational performance comparison with selected baselines [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

Action Quality Assessment (AQA) aims to automatically evaluate how well human actions are performed and has been widely applied in sports analysis, skill assessment, and healthcare. However, AQA studies are often developed under heterogeneous datasets and evaluation settings, making systematic comparison across methods difficult. To address these challenges, we present a comprehensive survey of recent advances in AQA. In particular, we propose a modality-driven hierarchical taxonomy that organizes existing methods into video-based, skeleton-based, and multi-modal approaches, and analyze the methodological evolution of representative models. We further establish a unified benchmark for representative video-based AQA methods by integrating diverse datasets and standardized evaluation protocols, enabling consistent comparison in terms of both accuracy and computational efficiency. Finally, we analyze emerging research trends, identify key challenges in current AQA research, and outline future directions ranging from near-term methodological advances to longer-term opportunities enabled by emerging AI paradigms. The project web page can be found at https://ZhouKanglei.github.io/AQA-Survey.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey organizes AQA work with a modality-driven taxonomy and a unified video benchmark, but the benchmark's cross-dataset comparisons look vulnerable to domain and scale differences.

read the letter

The paper's main value is the taxonomy that splits methods into video-based, skeleton-based, and multi-modal categories, plus the attempt to run a shared benchmark on video approaches by pulling in multiple datasets under one protocol. That gives the subfield a clearer map than scattered prior reviews and lets people see accuracy and efficiency numbers side by side for the first time in one place. The write-up of method evolution is straightforward and cites the key papers without obvious gaps in the abstract-level coverage. Credit for shipping the project page with the benchmark details too; that makes the numbers easier to check later. The unification step is the soft spot. Datasets in this area come from sports, healthcare, and skill training, with different score ranges and action types. Without explicit per-dataset rescaling or domain checks, the reported rankings can shift with how the integration was done rather than reflect true method strength. The abstract does not spell out those normalization steps, so the benchmark's reliability stays an open question until the full text and code are examined. This is aimed at CV researchers who work on action assessment or want a quick entry point into the literature. A reader who needs to pick a baseline or understand the split between modalities will find it useful. It is not a methods paper, so it will not change core techniques, but the organization effort is worth referee time. I would send it for review with a request that the benchmark section show exactly how heterogeneity was handled.

Referee Report

2 major / 2 minor

Summary. The paper surveys recent advances in Action Quality Assessment (AQA), proposing a modality-driven hierarchical taxonomy that classifies methods into video-based, skeleton-based, and multi-modal categories while analyzing their methodological evolution. It further establishes a unified benchmark for representative video-based AQA methods through integration of diverse datasets under standardized evaluation protocols, enabling comparisons on accuracy and efficiency, and concludes with analysis of trends, challenges, and future directions.

Significance. If the taxonomy provides a clear organizing framework and the benchmark delivers reproducible, valid cross-method comparisons, the work would offer a useful reference point for a fragmented research area, potentially reducing redundant experimentation and highlighting efficiency-accuracy trade-offs in AQA.

major comments (2)

[Abstract] Abstract and benchmark description: the claim that standardized protocols enable 'consistent comparison' across heterogeneous datasets is load-bearing for the central benchmark contribution, yet the manuscript provides no explicit description of normalization for differing scoring scales (absolute vs. relative) or domain shifts (sports vs. healthcare), leaving open the possibility that reported rankings reflect unification artifacts rather than intrinsic method properties.
[Benchmark section] Benchmark integration section: without reported per-dataset score rescaling, subset selection criteria, or domain-adaptation checks, the unified evaluation protocol risks invalidating cross-dataset accuracy and efficiency comparisons; this directly affects the validity of the 'representative' method rankings presented.

minor comments (2)

[Abstract] The project webpage URL is given but no details on whether benchmark code or dataset splits are released, which would strengthen reproducibility claims.
[Taxonomy section] Taxonomy figure or table would benefit from explicit inclusion criteria for methods to avoid selection bias in the hierarchical organization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which correctly identify areas where the benchmark contribution requires greater transparency. We will revise the manuscript to supply the missing methodological details on normalization and protocol standardization.

read point-by-point responses

Referee: [Abstract] Abstract and benchmark description: the claim that standardized protocols enable 'consistent comparison' across heterogeneous datasets is load-bearing for the central benchmark contribution, yet the manuscript provides no explicit description of normalization for differing scoring scales (absolute vs. relative) or domain shifts (sports vs. healthcare), leaving open the possibility that reported rankings reflect unification artifacts rather than intrinsic method properties.

Authors: We agree that the abstract and benchmark description would be strengthened by explicit statements on these points. In revision we will add a concise paragraph to the abstract and a dedicated methods subsection that describes: (i) the score normalization applied to each dataset (min-max to [0,1] for absolute scores and rank-based conversion for relative scores), (ii) the criteria used to select comparable action subsets across sports and healthcare domains, and (iii) the absence of explicit domain-adaptation modules together with the rationale that cross-dataset comparison is performed only after per-dataset standardization. These additions will make the unification process reproducible and will allow readers to assess whether rankings reflect method properties. revision: yes
Referee: [Benchmark section] Benchmark integration section: without reported per-dataset score rescaling, subset selection criteria, or domain-adaptation checks, the unified evaluation protocol risks invalidating cross-dataset accuracy and efficiency comparisons; this directly affects the validity of the 'representative' method rankings presented.

Authors: The referee correctly notes that the current text does not report these implementation details. We will expand the benchmark integration section with: (1) the exact rescaling formulas and code-level implementation for each dataset, (2) the subset selection rules (e.g., action classes present in at least three datasets, minimum sample size thresholds), and (3) a short discussion of domain shift mitigation (or its absence) together with any post-hoc checks performed. If certain normalizations prove infeasible for particular datasets, we will state the limitation and qualify the corresponding rankings accordingly. revision: yes

Circularity Check

0 steps flagged

Survey paper with no derivation chain exhibits no circularity

full rationale

This is a literature survey that organizes existing AQA methods into a modality-driven taxonomy and re-evaluates representative video-based methods on integrated datasets under standardized protocols. No equations, fitted parameters, predictions, or uniqueness theorems appear in the manuscript. All claims rest on citations to external prior work rather than any self-contained derivation that reduces to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work introduces no new free parameters, axioms, or invented entities; it compiles and structures existing published methods and datasets.

pith-pipeline@v0.9.0 · 5715 in / 1031 out tokens · 66546 ms · 2026-05-23T06:52:30.253529+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback
cs.CV 2026-05 unverdicted novelty 5.0

SkillFormer, PATS, and ProfVLM deliver state-of-the-art multi-view proficiency estimation on Ego-Exo4D with up to 20x fewer parameters by combining selective fusion, dense sampling, and generative feedback.

Reference graph

Works this paper leans on

181 extracted references · 181 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

End-to-end learning for action quality assessment,

Y. Li, X. Chai, and X. Chen, “End-to-end learning for action quality assessment,” in PCM, pp. 125–134, Springer, 2018

work page 2018
[2]

A novel blind action quality assessment based on multi-headed gru network and attention mechanism,

W. Sun, Y. Hu, B. Zhang, X. Chen, C. Hao, and Y. Gao, “A novel blind action quality assessment based on multi-headed gru network and attention mechanism,” in AIAHPC, vol. 12717, pp. 835–843, SPIE, 2023

work page 2023
[3]

Action quality assessment for asd behaviour evaluation,

D. Zhang, D. Zhou, and H. Liu, “Action quality assessment for asd behaviour evaluation,” in ICMLC, pp. 483–488, IEEE, 2023

work page 2023
[4]

Towards unified surgical skill assessment,

D. Liu, Q. Li, T. Jiang, Y. Wang, R. Miao, F. Shan, and Z. Li, “Towards unified surgical skill assessment,” in CVPR, pp. 9522– 9531, 2021

work page 2021
[5]

Video-based skill assessment for golf: Estimating golf handicap,

C. K. Ingwersen, A. Xarles, A. Clap ´es, M. Madadi, J. N. Jensen, M. R. Hannemose, A. B. Dahl, and S. Escalera, “Video-based skill assessment for golf: Estimating golf handicap,” in International Workshop on Multimedia Content Analysis in Sports, pp. 31–39, 2023

work page 2023
[6]

Automated video assessment of human perfor- mance,

A. S. Gordon, “Automated video assessment of human perfor- mance,” in AI-ED, vol. 2, p. 10, 1995

work page 1995
[7]

Learning to score figure skating sport videos,

C. Xu, Y. Fu, B. Zhang, Z. Chen, Y.-G. Jiang, and X. Xue, “Learning to score figure skating sport videos,” IEEE TCSVT, vol. 30, no. 12, pp. 4578–4590, 2019

work page 2019
[8]

Learning time-aware features for action quality assessment,

Y. Zhang, W. Xiong, and S. Mi, “Learning time-aware features for action quality assessment,” PRL, vol. 158, pp. 104–110, 2022

work page 2022
[9]

Eagle-eye: Extreme-pose action grader using detail bird’s-eye view,

M. Nekoui, F. O. T. Cruz, and L. Cheng, “Eagle-eye: Extreme-pose action grader using detail bird’s-eye view,” in WACV, pp. 394– 402, 2021

work page 2021
[10]

A hierarchical joint training based replay-guided contrastive transformer for action quality assessment of figure skating,

Y. LIU, X. CHENG, and T. IKENAGA, “A hierarchical joint training based replay-guided contrastive transformer for action quality assessment of figure skating,” IEICE Transactions on Fun- damentals of Electronics, Communications and Computer Sciences , 2024

work page 2024
[11]

The kimore dataset: Kinematic as- sessment of movement and clinical scores for remote monitoring of physical rehabilitation,

M. Capecci, M. G. Ceravolo, F. Ferracuti, S. Iarlori, A. Monteriu, L. Romeo, and F. Verdini, “The kimore dataset: Kinematic as- sessment of movement and clinical scores for remote monitoring of physical rehabilitation,” TNSRE, vol. 27, no. 7, pp. 1436–1448, 2019. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17

work page 2019
[12]

Aifit: Automatic 3d human-interpretable feedback models for fitness training,

M. Fieraru, M. Zanfir, S. C. Pirlea, V . Olaru, and C. Sminchisescu, “Aifit: Automatic 3d human-interpretable feedback models for fitness training,” in CVPR, pp. 9919–9928, 2021

work page 2021
[13]

A video-based augmented reality system for human-in-the-loop muscle strength assessment of juvenile dermatomyositis,

K. Zhou, R. Cai, Y. Ma, Q. Tan, X. Wang, J. Li, H. P . Shum, F. W. Li, S. Jin, and X. Liang, “A video-based augmented reality system for human-in-the-loop muscle strength assessment of juvenile dermatomyositis,” IEEE TVCG , vol. 29, no. 5, pp. 2456–2466, 2023

work page 2023
[14]

Piano skills assessment,

P . Parmar, J. Reddy, and B. Morris, “Piano skills assessment,” in MMSP, pp. 1–5, IEEE, 2021

work page 2021
[15]

Relative hidden markov models for video- based evaluation of motion skills in surgical training,

Q. Zhang and B. Li, “Relative hidden markov models for video- based evaluation of motion skills in surgical training,” IEEE TP AMI, vol. 37, no. 6, pp. 1206–1218, 2014

work page 2014
[16]

Action recognition with improved trajectories,

H. Wang and C. Schmid, “Action recognition with improved trajectories,” in ICCV, pp. 3551–3558, 2013

work page 2013
[17]

A 3-dimensional sift de- scriptor and its application to action recognition,

P . Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift de- scriptor and its application to action recognition,” in ACM MM, pp. 357–360, 2007

work page 2007
[18]

Attractor-shape for dynamical analysis of hu- man movement: Applications in stroke rehabilitation and action recognition,

V . Venkataraman, P . Turaga, N. Lehrer, M. Baran, T. Rikakis, and S. Wolf, “Attractor-shape for dynamical analysis of hu- man movement: Applications in stroke rehabilitation and action recognition,” in CVPRW, pp. 514–520, 2013

work page 2013
[19]

Infogcn++: Learn- ing representation by predicting the future for online skeleton- based action recognition,

S. Chi, H.-g. Chi, Q. Huang, and K. Ramani, “Infogcn++: Learn- ing representation by predicting the future for online skeleton- based action recognition,” IEEE TP AMI, 2024

work page 2024
[20]

A survey of vision-based human action evaluation methods,

Q. Lei, J.-X. Du, H.-B. Zhang, S. Ye, and D.-S. Chen, “A survey of vision-based human action evaluation methods,” Sensors, vol. 19, no. 19, p. 4129, 2019

work page 2019
[21]

A survey of video-based action quality assessment,

S. Wang, D. Yang, P . Zhai, Q. Yu, T. Suo, Z. Sun, K. Li, and L. Zhang, “A survey of video-based action quality assessment,” in INSAI, pp. 1–9, IEEE, 2021

work page 2021
[22]

Vision- based human action quality assessment: A systematic review,

J. Liu, H. Wang, K. Stawarz, S. Li, Y. Fu, and H. Liu, “Vision- based human action quality assessment: A systematic review,” Expert Systems with Applications, p. 125642, 2024

work page 2024
[23]

A comprehensive survey of continual learning: theory, method and application,

L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: theory, method and application,” TP AMI, 2024

work page 2024
[24]

Hierarchical graph convolutional networks for action quality assessment,

K. Zhou, Y. Ma, H. P . Shum, and X. Liang, “Hierarchical graph convolutional networks for action quality assessment,” IEEE TCSVT, vol. 33, no. 12, pp. 7749–7763, 2023

work page 2023
[25]

Group-aware contrastive regression for action quality assessment,

X. Yu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, “Group-aware contrastive regression for action quality assessment,” in ICCV, pp. 7919–7928, 2021

work page 2021
[26]

Vision- language action knowledge learning for semantic-aware action quality assessment,

H. Xu, X. Ke, Y. Li, R. Xu, H. Wu, X. Lin, and W. Guo, “Vision- language action knowledge learning for semantic-aware action quality assessment,” in ECCV, 2024

work page 2024
[27]

Narrative action evaluation with prompt-guided multimodal interaction,

S. Zhang, S. Bai, G. Chen, L. Chen, J. Lu, J. Wang, and Y. Tang, “Narrative action evaluation with prompt-guided multimodal interaction,” in CVPR, pp. 18430–18439, 2024

work page 2024
[28]

Ricaˆ 2: Rubric-informed, calibrated assessment of actions,

A. Majeedi, V . R. Gajjala, S. S. S. N. GNVV , and Y. Li, “Ricaˆ 2: Rubric-informed, calibrated assessment of actions,” arXiv preprint arXiv:2408.02138, 2024

work page arXiv 2024
[29]

Multimodal action quality assess- ment,

L.-A. Zeng and W.-S. Zheng, “Multimodal action quality assess- ment,” IEEE TIP, 2024

work page 2024
[30]

Semi-supervised action quality assessment with self-supervised segment feature recovery,

S.-J. Zhang, J.-H. Pan, J. Gao, and W.-S. Zheng, “Semi-supervised action quality assessment with self-supervised segment feature recovery,” IEEE TCSVT, vol. 32, no. 9, pp. 6017–6028, 2022

work page 2022
[31]

Semi-supervised teacher- reference-student architecture for action quality assessment,

W. Yun, M. Qi, F. Peng, and H. Ma, “Semi-supervised teacher- reference-student architecture for action quality assessment,” arXiv preprint arXiv:2407.19675, 2024

work page arXiv 2024
[32]

Magr: Manifold-aligned graph regularization for con- tinual action quality assessment,

K. Zhou, L. Wang, X. Zhang, H. P . Shum, F. W. Li, J. Li, and X. Liang, “Magr: Manifold-aligned graph regularization for con- tinual action quality assessment,” arXiv preprint arXiv:2403.04398, 2024

work page arXiv 2024
[33]

Continual action assessment via task-consistent score-discriminative feature distribution modeling,

Y.-M. Li, L.-A. Zeng, J.-K. Meng, and W.-S. Zheng, “Continual action assessment via task-consistent score-discriminative feature distribution modeling,” IEEE TCSVT, 2024

work page 2024
[34]

Pecop: Parameter efficient continual pretraining for action quality as- sessment,

A. Dadashzadeh, S. Duan, A. Whone, and M. Mirmehdi, “Pecop: Parameter efficient continual pretraining for action quality as- sessment,” in WACV, pp. 42–52, 2024

work page 2024
[35]

Techcoach: Towards technical keypoint-aware descriptive action coaching,

Y.-M. Li, A.-L. Wang, K.-Y. Lin, T. Yu-Ming, L.-A. Zeng, J.-F. Hu, and W.-S. Zheng, “Techcoach: Towards technical keypoint-aware descriptive action coaching,” arXiv preprint arXiv:2411.17130 , 2024

work page arXiv 2024
[36]

Likert scoring with grade decoupling for long-term action assessment,

A. Xu, L.-A. Zeng, and W.-S. Zheng, “Likert scoring with grade decoupling for long-term action assessment,” in CVPR, pp. 3232– 3241, 2022

work page 2022
[37]

What and how well you performed? a multitask learning approach to action quality assessment,

P . Parmar and B. T. Morris, “What and how well you performed? a multitask learning approach to action quality assessment,” in CVPR, pp. 304–313, 2019

work page 2019
[38]

A figure skating jumping dataset for replay-guided action quality assessment,

Y. Liu, X. Cheng, and T. Ikenaga, “A figure skating jumping dataset for replay-guided action quality assessment,” in ACM MM, pp. 2437–2445, 2023

work page 2023
[39]

Towards accurate and interpretable surgical skill assessment: A video-based method incorporat- ing recognized surgical gestures and skill levels,

T. Wang, Y. Wang, and M. Li, “Towards accurate and interpretable surgical skill assessment: A video-based method incorporat- ing recognized surgical gestures and skill levels,” in MICCAI, pp. 668–678, Springer, 2020

work page 2020
[40]

Who’s better? who’s best? pairwise deep ranking for skill determination,

H. Doughty, D. Damen, and W. Mayol-Cuevas, “Who’s better? who’s best? pairwise deep ranking for skill determination,” in CVPR, pp. 6057–6066, 2018

work page 2018
[41]

Which is the better teacher action? a new ranking model and dataset,

M. Fang, X. Du, Q. Liu, Y. Zhou, Q. Liang, and S. Liu, “Which is the better teacher action? a new ranking model and dataset,” in ICASSP, pp. 7695–7699, IEEE, 2024

work page 2024
[42]

Imagenet clas- sification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet clas- sification with deep convolutional neural networks,” NeurIPS, vol. 25, 2012

work page 2012
[43]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, pp. 770–778, 2016

work page 2016
[44]

Learning spatiotemporal features with 3d convolutional net- works,

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional net- works,” in ICCV, pp. 4489–4497, 2015

work page 2015
[45]

Learning spatio-temporal representa- tion with pseudo-3d residual networks,

Z. Qiu, T. Yao, and T. Mei, “Learning spatio-temporal representa- tion with pseudo-3d residual networks,” in ICCV, pp. 5533–5541, 2017

work page 2017
[46]

Quo vadis, action recognition? a new model and the kinetics dataset,

J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in CVPR, pp. 6299–6308, 2017

work page 2017
[47]

Video swin transformer,

Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu, “Video swin transformer,” in CVPR, pp. 3202–3211, 2022

work page 2022
[48]

Tsa-net: Tube self-attention network for action quality assessment,

S. Wang, D. Yang, P . Zhai, C. Chen, and L. Zhang, “Tsa-net: Tube self-attention network for action quality assessment,” in ACM MM, pp. 4902–4910, 2021

work page 2021
[49]

Action quality assessment with ignoring scene con- text,

T. Nagai, S. Takeda, M. Matsumura, S. Shimizu, and S. Ya- mamoto, “Action quality assessment with ignoring scene con- text,” in ICIP, pp. 1189–1193, IEEE, 2021

work page 2021
[50]

Action assessment by joint relation graphs,

J.-H. Pan, J. Gao, and W.-S. Zheng, “Action assessment by joint relation graphs,” in ICCV, pp. 6331–6340, 2019

work page 2019
[51]

Adaptive action assessment,

J.-H. Pan, J. Gao, and W.-S. Zheng, “Adaptive action assessment,” IEEE TP AMI, vol. 44, no. 12, pp. 8779–8795, 2021

work page 2021
[52]

Self- supervised subaction parsing network for semi-supervised action quality assessment,

K. Gedamu, Y. Ji, Y. Yang, J. Shao, and H. T. Shen, “Self- supervised subaction parsing network for semi-supervised action quality assessment,” IEEE TIP, 2024

work page 2024
[53]

Fine-grained spatio-temporal parsing network for action quality assessment,

K. Gedamu, Y. Ji, Y. Yang, J. Shao, and H. T. Shen, “Fine-grained spatio-temporal parsing network for action quality assessment,” IEEE TIP, vol. 32, pp. 6386–6400, 2023

work page 2023
[54]

Surgical skill assessment via video semantic aggregation,

Z. Li, L. Gu, W. Wang, R. Nakamura, and Y. Sato, “Surgical skill assessment via video semantic aggregation,” inMICCAI, pp. 410– 420, Springer, 2022

work page 2022
[55]

Hierarchical neurosymbolic ap- proach for comprehensive and explainable action quality assess- ment,

L. Okamoto and P . Parmar, “Hierarchical neurosymbolic ap- proach for comprehensive and explainable action quality assess- ment,” in CVPRW, pp. 3204–3213, 2024

work page 2024
[56]

In- terpretable long-term action quality assessment,

X. Dong, X. Liu, W. Li, A. Adeyemi-Ejeye, and A. Gilbert, “In- terpretable long-term action quality assessment,” arXiv preprint arXiv:2408.11687, 2024

work page arXiv 2024
[57]

Finediving: A fine-grained dataset for procedure-aware action quality assess- ment,

J. Xu, Y. Rao, X. Yu, G. Chen, J. Zhou, and J. Lu, “Finediving: A fine-grained dataset for procedure-aware action quality assess- ment,” in CVPR, pp. 2949–2958, 2022

work page 2022
[58]

Action quality assessment with temporal parsing transformer,

Y. Bai, D. Zhou, S. Zhang, J. Wang, E. Ding, Y. Guan, Y. Long, and J. Wang, “Action quality assessment with temporal parsing transformer,” in ECCV, pp. 422–438, Springer, 2022

work page 2022
[59]

Fineparser: A fine- grained spatio-temporal action parser for human-centric action quality assessment,

J. Xu, S. Yin, G. Zhao, Z. Wang, and Y. Peng, “Fineparser: A fine- grained spatio-temporal action parser for human-centric action quality assessment,” in CVPR, pp. 14628–14637, 2024

work page 2024
[60]

Iris: Interpretable rubric-informed segmentation for action quality assessment,

H. Matsuyama, N. Kawaguchi, and B. Y. Lim, “Iris: Interpretable rubric-informed segmentation for action quality assessment,” in ICIUI, pp. 368–378, 2023

work page 2023
[61]

Uncertainty-aware score distribution learning for action quality assessment,

Y. Tang, Z. Ni, J. Zhou, D. Zhang, J. Lu, Y. Wu, and J. Zhou, “Uncertainty-aware score distribution learning for action quality assessment,” in CVPR, pp. 9839–9848, 2020

work page 2020
[62]

Uncertainty-driven action quality assessment,

C. Zhou, Y. Huang, and H. Ling, “Uncertainty-driven action quality assessment,” arXiv preprint arXiv:2207.14513, 2022

work page arXiv 2022
[63]

Auto-encoding score distribution regression for action quality JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18 assessment,

B. Zhang, J. Chen, Y. Xu, H. Zhang, X. Yang, and X. Geng, “Auto-encoding score distribution regression for action quality JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18 assessment,” Neural Computing and Applications , vol. 36, no. 2, pp. 929–942, 2024

work page 2021
[64]

Localization- assisted uncertainty score disentanglement network for action quality assessment,

Y. Ji, L. Ye, H. Huang, L. Mao, Y. Zhou, and L. Gao, “Localization- assisted uncertainty score disentanglement network for action quality assessment,” in ACM MM, pp. 8590–8597, 2023

work page 2023
[65]

Cofinal: Enhancing action quality assessment with coarse-to-fine instruc- tion alignment,

K. Zhou, J. Li, R. Cai, L. Wang, X. Zhang, and X. Liang, “Cofinal: Enhancing action quality assessment with coarse-to-fine instruc- tion alignment,” in IJCAI, 2024

work page 2024
[66]

Pairwise contrastive learning network for action quality assessment,

M. Li, H.-B. Zhang, Q. Lei, Z. Fan, J. Liu, and J.-X. Du, “Pairwise contrastive learning network for action quality assessment,” in ECCV, pp. 457–473, Springer, 2022

work page 2022
[67]

Two-path target-aware contrastive regression for action quality assessment,

X. Ke, H. Xu, X. Lin, and W. Guo, “Two-path target-aware contrastive regression for action quality assessment,” Information Sciences, vol. 664, p. 120347, 2024

work page 2024
[68]

Multi-stage contrastive regression for action quality assessment,

Q. An, M. Qi, and H. Ma, “Multi-stage contrastive regression for action quality assessment,” in ICASSP, pp. 4110–4114, IEEE, 2024

work page 2024
[69]

Rhyth- mer: Ranking-based skill assessment with rhythm-aware trans- former,

Z. Luo, Y. Xiao, F. Yang, J. T. Zhou, and Z. Fang, “Rhyth- mer: Ranking-based skill assessment with rhythm-aware trans- former,” IEEE TCSVT, 2024

work page 2024
[70]

Realtime multi-person 2d pose estimation using part affinity fields,

Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in CVPR, pp. 7291– 7299, 2017

work page 2017
[71]

MediaPipe: A Framework for Building Perception Pipelines

C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee, et al. , “Mediapipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[72]

Vitpose: Simple vi- sion transformer baselines for human pose estimation,

Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “Vitpose: Simple vi- sion transformer baselines for human pose estimation,” NeurIPS, vol. 35, pp. 38571–38584, 2022

work page 2022
[73]

Skeleton-based action quality assess- ment via partially connected lstm with triplet losses,

X. Wang, J. Li, and H. Hu, “Skeleton-based action quality assess- ment via partially connected lstm with triplet losses,” in PRCV, pp. 220–232, Springer, 2022

work page 2022
[74]

EGCN: an ensemble-based learning framework for exploring effective skeleton-based rehabilitation exercise assessment,

B. X. B. Yu, Y. Liu, X. Zhang, G. Chen, and K. C. C. Chan, “EGCN: an ensemble-based learning framework for exploring effective skeleton-based rehabilitation exercise assessment,” in IJCAI, pp. 3681–3687, 2022

work page 2022
[75]

Egcn++: A new fusion strategy for ensemble learning in skeleton-based rehabilitation exercise assessment,

X. Bruce, Y. Liu, K. C. Chan, and C. W. Chen, “Egcn++: A new fusion strategy for ensemble learning in skeleton-based rehabilitation exercise assessment,” IEEE TP AMI, 2024

work page 2024
[76]

A graph convolutional siamese network for the assessment and recognition of physical rehabili- tation exercises,

C. Li, X. Ling, and S. Xia, “A graph convolutional siamese network for the assessment and recognition of physical rehabili- tation exercises,” in ICANN, pp. 229–240, Springer, 2023

work page 2023
[77]

Skeleton- based human action evaluation using graph convolutional net- work for monitoring alzheimer’s progression,

X. Bruce, Y. Liu, K. C. Chan, Q. Yang, and X. Wang, “Skeleton- based human action evaluation using graph convolutional net- work for monitoring alzheimer’s progression,” PR, vol. 119, p. 108095, 2021

work page 2021
[78]

A deep learning framework for assessing physical rehabilitation exercises,

Y. Liao, A. Vakanski, and M. Xian, “A deep learning framework for assessing physical rehabilitation exercises,” IEEE TNSRE , vol. 28, no. 2, pp. 468–477, 2020

work page 2020
[79]

Spatial temporal graph convolu- tional networks for skeleton-based action recognition,

S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolu- tional networks for skeleton-based action recognition,” in AAAI, vol. 32, 2018

work page 2018
[80]

An attention-based adaptive spatial–temporal graph convolutional network for long-video ergonomic risk assessment,

C. Zhou, J. Zeng, L. Qiu, S. Wang, P . Liu, and J. Pan, “An attention-based adaptive spatial–temporal graph convolutional network for long-video ergonomic risk assessment,” Engineering Applications of Artificial Intelligence, vol. 131, p. 107780, 2024

work page 2024

Showing first 80 references.

[1] [1]

End-to-end learning for action quality assessment,

Y. Li, X. Chai, and X. Chen, “End-to-end learning for action quality assessment,” in PCM, pp. 125–134, Springer, 2018

work page 2018

[2] [2]

A novel blind action quality assessment based on multi-headed gru network and attention mechanism,

W. Sun, Y. Hu, B. Zhang, X. Chen, C. Hao, and Y. Gao, “A novel blind action quality assessment based on multi-headed gru network and attention mechanism,” in AIAHPC, vol. 12717, pp. 835–843, SPIE, 2023

work page 2023

[3] [3]

Action quality assessment for asd behaviour evaluation,

D. Zhang, D. Zhou, and H. Liu, “Action quality assessment for asd behaviour evaluation,” in ICMLC, pp. 483–488, IEEE, 2023

work page 2023

[4] [4]

Towards unified surgical skill assessment,

D. Liu, Q. Li, T. Jiang, Y. Wang, R. Miao, F. Shan, and Z. Li, “Towards unified surgical skill assessment,” in CVPR, pp. 9522– 9531, 2021

work page 2021

[5] [5]

Video-based skill assessment for golf: Estimating golf handicap,

C. K. Ingwersen, A. Xarles, A. Clap ´es, M. Madadi, J. N. Jensen, M. R. Hannemose, A. B. Dahl, and S. Escalera, “Video-based skill assessment for golf: Estimating golf handicap,” in International Workshop on Multimedia Content Analysis in Sports, pp. 31–39, 2023

work page 2023

[6] [6]

Automated video assessment of human perfor- mance,

A. S. Gordon, “Automated video assessment of human perfor- mance,” in AI-ED, vol. 2, p. 10, 1995

work page 1995

[7] [7]

Learning to score figure skating sport videos,

C. Xu, Y. Fu, B. Zhang, Z. Chen, Y.-G. Jiang, and X. Xue, “Learning to score figure skating sport videos,” IEEE TCSVT, vol. 30, no. 12, pp. 4578–4590, 2019

work page 2019

[8] [8]

Learning time-aware features for action quality assessment,

Y. Zhang, W. Xiong, and S. Mi, “Learning time-aware features for action quality assessment,” PRL, vol. 158, pp. 104–110, 2022

work page 2022

[9] [9]

Eagle-eye: Extreme-pose action grader using detail bird’s-eye view,

M. Nekoui, F. O. T. Cruz, and L. Cheng, “Eagle-eye: Extreme-pose action grader using detail bird’s-eye view,” in WACV, pp. 394– 402, 2021

work page 2021

[10] [10]

A hierarchical joint training based replay-guided contrastive transformer for action quality assessment of figure skating,

Y. LIU, X. CHENG, and T. IKENAGA, “A hierarchical joint training based replay-guided contrastive transformer for action quality assessment of figure skating,” IEICE Transactions on Fun- damentals of Electronics, Communications and Computer Sciences , 2024

work page 2024

[11] [11]

The kimore dataset: Kinematic as- sessment of movement and clinical scores for remote monitoring of physical rehabilitation,

M. Capecci, M. G. Ceravolo, F. Ferracuti, S. Iarlori, A. Monteriu, L. Romeo, and F. Verdini, “The kimore dataset: Kinematic as- sessment of movement and clinical scores for remote monitoring of physical rehabilitation,” TNSRE, vol. 27, no. 7, pp. 1436–1448, 2019. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17

work page 2019

[12] [12]

Aifit: Automatic 3d human-interpretable feedback models for fitness training,

M. Fieraru, M. Zanfir, S. C. Pirlea, V . Olaru, and C. Sminchisescu, “Aifit: Automatic 3d human-interpretable feedback models for fitness training,” in CVPR, pp. 9919–9928, 2021

work page 2021

[13] [13]

A video-based augmented reality system for human-in-the-loop muscle strength assessment of juvenile dermatomyositis,

K. Zhou, R. Cai, Y. Ma, Q. Tan, X. Wang, J. Li, H. P . Shum, F. W. Li, S. Jin, and X. Liang, “A video-based augmented reality system for human-in-the-loop muscle strength assessment of juvenile dermatomyositis,” IEEE TVCG , vol. 29, no. 5, pp. 2456–2466, 2023

work page 2023

[14] [14]

Piano skills assessment,

P . Parmar, J. Reddy, and B. Morris, “Piano skills assessment,” in MMSP, pp. 1–5, IEEE, 2021

work page 2021

[15] [15]

Relative hidden markov models for video- based evaluation of motion skills in surgical training,

Q. Zhang and B. Li, “Relative hidden markov models for video- based evaluation of motion skills in surgical training,” IEEE TP AMI, vol. 37, no. 6, pp. 1206–1218, 2014

work page 2014

[16] [16]

Action recognition with improved trajectories,

H. Wang and C. Schmid, “Action recognition with improved trajectories,” in ICCV, pp. 3551–3558, 2013

work page 2013

[17] [17]

A 3-dimensional sift de- scriptor and its application to action recognition,

P . Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift de- scriptor and its application to action recognition,” in ACM MM, pp. 357–360, 2007

work page 2007

[18] [18]

Attractor-shape for dynamical analysis of hu- man movement: Applications in stroke rehabilitation and action recognition,

V . Venkataraman, P . Turaga, N. Lehrer, M. Baran, T. Rikakis, and S. Wolf, “Attractor-shape for dynamical analysis of hu- man movement: Applications in stroke rehabilitation and action recognition,” in CVPRW, pp. 514–520, 2013

work page 2013

[19] [19]

Infogcn++: Learn- ing representation by predicting the future for online skeleton- based action recognition,

S. Chi, H.-g. Chi, Q. Huang, and K. Ramani, “Infogcn++: Learn- ing representation by predicting the future for online skeleton- based action recognition,” IEEE TP AMI, 2024

work page 2024

[20] [20]

A survey of vision-based human action evaluation methods,

Q. Lei, J.-X. Du, H.-B. Zhang, S. Ye, and D.-S. Chen, “A survey of vision-based human action evaluation methods,” Sensors, vol. 19, no. 19, p. 4129, 2019

work page 2019

[21] [21]

A survey of video-based action quality assessment,

S. Wang, D. Yang, P . Zhai, Q. Yu, T. Suo, Z. Sun, K. Li, and L. Zhang, “A survey of video-based action quality assessment,” in INSAI, pp. 1–9, IEEE, 2021

work page 2021

[22] [22]

Vision- based human action quality assessment: A systematic review,

J. Liu, H. Wang, K. Stawarz, S. Li, Y. Fu, and H. Liu, “Vision- based human action quality assessment: A systematic review,” Expert Systems with Applications, p. 125642, 2024

work page 2024

[23] [23]

A comprehensive survey of continual learning: theory, method and application,

L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: theory, method and application,” TP AMI, 2024

work page 2024

[24] [24]

Hierarchical graph convolutional networks for action quality assessment,

K. Zhou, Y. Ma, H. P . Shum, and X. Liang, “Hierarchical graph convolutional networks for action quality assessment,” IEEE TCSVT, vol. 33, no. 12, pp. 7749–7763, 2023

work page 2023

[25] [25]

Group-aware contrastive regression for action quality assessment,

X. Yu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, “Group-aware contrastive regression for action quality assessment,” in ICCV, pp. 7919–7928, 2021

work page 2021

[26] [26]

Vision- language action knowledge learning for semantic-aware action quality assessment,

H. Xu, X. Ke, Y. Li, R. Xu, H. Wu, X. Lin, and W. Guo, “Vision- language action knowledge learning for semantic-aware action quality assessment,” in ECCV, 2024

work page 2024

[27] [27]

Narrative action evaluation with prompt-guided multimodal interaction,

S. Zhang, S. Bai, G. Chen, L. Chen, J. Lu, J. Wang, and Y. Tang, “Narrative action evaluation with prompt-guided multimodal interaction,” in CVPR, pp. 18430–18439, 2024

work page 2024

[28] [28]

Ricaˆ 2: Rubric-informed, calibrated assessment of actions,

A. Majeedi, V . R. Gajjala, S. S. S. N. GNVV , and Y. Li, “Ricaˆ 2: Rubric-informed, calibrated assessment of actions,” arXiv preprint arXiv:2408.02138, 2024

work page arXiv 2024

[29] [29]

Multimodal action quality assess- ment,

L.-A. Zeng and W.-S. Zheng, “Multimodal action quality assess- ment,” IEEE TIP, 2024

work page 2024

[30] [30]

Semi-supervised action quality assessment with self-supervised segment feature recovery,

S.-J. Zhang, J.-H. Pan, J. Gao, and W.-S. Zheng, “Semi-supervised action quality assessment with self-supervised segment feature recovery,” IEEE TCSVT, vol. 32, no. 9, pp. 6017–6028, 2022

work page 2022

[31] [31]

Semi-supervised teacher- reference-student architecture for action quality assessment,

W. Yun, M. Qi, F. Peng, and H. Ma, “Semi-supervised teacher- reference-student architecture for action quality assessment,” arXiv preprint arXiv:2407.19675, 2024

work page arXiv 2024

[32] [32]

Magr: Manifold-aligned graph regularization for con- tinual action quality assessment,

K. Zhou, L. Wang, X. Zhang, H. P . Shum, F. W. Li, J. Li, and X. Liang, “Magr: Manifold-aligned graph regularization for con- tinual action quality assessment,” arXiv preprint arXiv:2403.04398, 2024

work page arXiv 2024

[33] [33]

Continual action assessment via task-consistent score-discriminative feature distribution modeling,

Y.-M. Li, L.-A. Zeng, J.-K. Meng, and W.-S. Zheng, “Continual action assessment via task-consistent score-discriminative feature distribution modeling,” IEEE TCSVT, 2024

work page 2024

[34] [34]

Pecop: Parameter efficient continual pretraining for action quality as- sessment,

A. Dadashzadeh, S. Duan, A. Whone, and M. Mirmehdi, “Pecop: Parameter efficient continual pretraining for action quality as- sessment,” in WACV, pp. 42–52, 2024

work page 2024

[35] [35]

Techcoach: Towards technical keypoint-aware descriptive action coaching,

Y.-M. Li, A.-L. Wang, K.-Y. Lin, T. Yu-Ming, L.-A. Zeng, J.-F. Hu, and W.-S. Zheng, “Techcoach: Towards technical keypoint-aware descriptive action coaching,” arXiv preprint arXiv:2411.17130 , 2024

work page arXiv 2024

[36] [36]

Likert scoring with grade decoupling for long-term action assessment,

A. Xu, L.-A. Zeng, and W.-S. Zheng, “Likert scoring with grade decoupling for long-term action assessment,” in CVPR, pp. 3232– 3241, 2022

work page 2022

[37] [37]

What and how well you performed? a multitask learning approach to action quality assessment,

P . Parmar and B. T. Morris, “What and how well you performed? a multitask learning approach to action quality assessment,” in CVPR, pp. 304–313, 2019

work page 2019

[38] [38]

A figure skating jumping dataset for replay-guided action quality assessment,

Y. Liu, X. Cheng, and T. Ikenaga, “A figure skating jumping dataset for replay-guided action quality assessment,” in ACM MM, pp. 2437–2445, 2023

work page 2023

[39] [39]

Towards accurate and interpretable surgical skill assessment: A video-based method incorporat- ing recognized surgical gestures and skill levels,

T. Wang, Y. Wang, and M. Li, “Towards accurate and interpretable surgical skill assessment: A video-based method incorporat- ing recognized surgical gestures and skill levels,” in MICCAI, pp. 668–678, Springer, 2020

work page 2020

[40] [40]

Who’s better? who’s best? pairwise deep ranking for skill determination,

H. Doughty, D. Damen, and W. Mayol-Cuevas, “Who’s better? who’s best? pairwise deep ranking for skill determination,” in CVPR, pp. 6057–6066, 2018

work page 2018

[41] [41]

Which is the better teacher action? a new ranking model and dataset,

M. Fang, X. Du, Q. Liu, Y. Zhou, Q. Liang, and S. Liu, “Which is the better teacher action? a new ranking model and dataset,” in ICASSP, pp. 7695–7699, IEEE, 2024

work page 2024

[42] [42]

Imagenet clas- sification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet clas- sification with deep convolutional neural networks,” NeurIPS, vol. 25, 2012

work page 2012

[43] [43]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, pp. 770–778, 2016

work page 2016

[44] [44]

Learning spatiotemporal features with 3d convolutional net- works,

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional net- works,” in ICCV, pp. 4489–4497, 2015

work page 2015

[45] [45]

Learning spatio-temporal representa- tion with pseudo-3d residual networks,

Z. Qiu, T. Yao, and T. Mei, “Learning spatio-temporal representa- tion with pseudo-3d residual networks,” in ICCV, pp. 5533–5541, 2017

work page 2017

[46] [46]

Quo vadis, action recognition? a new model and the kinetics dataset,

J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in CVPR, pp. 6299–6308, 2017

work page 2017

[47] [47]

Video swin transformer,

Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu, “Video swin transformer,” in CVPR, pp. 3202–3211, 2022

work page 2022

[48] [48]

Tsa-net: Tube self-attention network for action quality assessment,

S. Wang, D. Yang, P . Zhai, C. Chen, and L. Zhang, “Tsa-net: Tube self-attention network for action quality assessment,” in ACM MM, pp. 4902–4910, 2021

work page 2021

[49] [49]

Action quality assessment with ignoring scene con- text,

T. Nagai, S. Takeda, M. Matsumura, S. Shimizu, and S. Ya- mamoto, “Action quality assessment with ignoring scene con- text,” in ICIP, pp. 1189–1193, IEEE, 2021

work page 2021

[50] [50]

Action assessment by joint relation graphs,

J.-H. Pan, J. Gao, and W.-S. Zheng, “Action assessment by joint relation graphs,” in ICCV, pp. 6331–6340, 2019

work page 2019

[51] [51]

Adaptive action assessment,

J.-H. Pan, J. Gao, and W.-S. Zheng, “Adaptive action assessment,” IEEE TP AMI, vol. 44, no. 12, pp. 8779–8795, 2021

work page 2021

[52] [52]

Self- supervised subaction parsing network for semi-supervised action quality assessment,

K. Gedamu, Y. Ji, Y. Yang, J. Shao, and H. T. Shen, “Self- supervised subaction parsing network for semi-supervised action quality assessment,” IEEE TIP, 2024

work page 2024

[53] [53]

Fine-grained spatio-temporal parsing network for action quality assessment,

K. Gedamu, Y. Ji, Y. Yang, J. Shao, and H. T. Shen, “Fine-grained spatio-temporal parsing network for action quality assessment,” IEEE TIP, vol. 32, pp. 6386–6400, 2023

work page 2023

[54] [54]

Surgical skill assessment via video semantic aggregation,

Z. Li, L. Gu, W. Wang, R. Nakamura, and Y. Sato, “Surgical skill assessment via video semantic aggregation,” inMICCAI, pp. 410– 420, Springer, 2022

work page 2022

[55] [55]

Hierarchical neurosymbolic ap- proach for comprehensive and explainable action quality assess- ment,

L. Okamoto and P . Parmar, “Hierarchical neurosymbolic ap- proach for comprehensive and explainable action quality assess- ment,” in CVPRW, pp. 3204–3213, 2024

work page 2024

[56] [56]

In- terpretable long-term action quality assessment,

X. Dong, X. Liu, W. Li, A. Adeyemi-Ejeye, and A. Gilbert, “In- terpretable long-term action quality assessment,” arXiv preprint arXiv:2408.11687, 2024

work page arXiv 2024

[57] [57]

Finediving: A fine-grained dataset for procedure-aware action quality assess- ment,

J. Xu, Y. Rao, X. Yu, G. Chen, J. Zhou, and J. Lu, “Finediving: A fine-grained dataset for procedure-aware action quality assess- ment,” in CVPR, pp. 2949–2958, 2022

work page 2022

[58] [58]

Action quality assessment with temporal parsing transformer,

Y. Bai, D. Zhou, S. Zhang, J. Wang, E. Ding, Y. Guan, Y. Long, and J. Wang, “Action quality assessment with temporal parsing transformer,” in ECCV, pp. 422–438, Springer, 2022

work page 2022

[59] [59]

Fineparser: A fine- grained spatio-temporal action parser for human-centric action quality assessment,

J. Xu, S. Yin, G. Zhao, Z. Wang, and Y. Peng, “Fineparser: A fine- grained spatio-temporal action parser for human-centric action quality assessment,” in CVPR, pp. 14628–14637, 2024

work page 2024

[60] [60]

Iris: Interpretable rubric-informed segmentation for action quality assessment,

H. Matsuyama, N. Kawaguchi, and B. Y. Lim, “Iris: Interpretable rubric-informed segmentation for action quality assessment,” in ICIUI, pp. 368–378, 2023

work page 2023

[61] [61]

Uncertainty-aware score distribution learning for action quality assessment,

Y. Tang, Z. Ni, J. Zhou, D. Zhang, J. Lu, Y. Wu, and J. Zhou, “Uncertainty-aware score distribution learning for action quality assessment,” in CVPR, pp. 9839–9848, 2020

work page 2020

[62] [62]

Uncertainty-driven action quality assessment,

C. Zhou, Y. Huang, and H. Ling, “Uncertainty-driven action quality assessment,” arXiv preprint arXiv:2207.14513, 2022

work page arXiv 2022

[63] [63]

Auto-encoding score distribution regression for action quality JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18 assessment,

B. Zhang, J. Chen, Y. Xu, H. Zhang, X. Yang, and X. Geng, “Auto-encoding score distribution regression for action quality JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18 assessment,” Neural Computing and Applications , vol. 36, no. 2, pp. 929–942, 2024

work page 2021

[64] [64]

Localization- assisted uncertainty score disentanglement network for action quality assessment,

Y. Ji, L. Ye, H. Huang, L. Mao, Y. Zhou, and L. Gao, “Localization- assisted uncertainty score disentanglement network for action quality assessment,” in ACM MM, pp. 8590–8597, 2023

work page 2023

[65] [65]

Cofinal: Enhancing action quality assessment with coarse-to-fine instruc- tion alignment,

K. Zhou, J. Li, R. Cai, L. Wang, X. Zhang, and X. Liang, “Cofinal: Enhancing action quality assessment with coarse-to-fine instruc- tion alignment,” in IJCAI, 2024

work page 2024

[66] [66]

Pairwise contrastive learning network for action quality assessment,

M. Li, H.-B. Zhang, Q. Lei, Z. Fan, J. Liu, and J.-X. Du, “Pairwise contrastive learning network for action quality assessment,” in ECCV, pp. 457–473, Springer, 2022

work page 2022

[67] [67]

Two-path target-aware contrastive regression for action quality assessment,

X. Ke, H. Xu, X. Lin, and W. Guo, “Two-path target-aware contrastive regression for action quality assessment,” Information Sciences, vol. 664, p. 120347, 2024

work page 2024

[68] [68]

Multi-stage contrastive regression for action quality assessment,

Q. An, M. Qi, and H. Ma, “Multi-stage contrastive regression for action quality assessment,” in ICASSP, pp. 4110–4114, IEEE, 2024

work page 2024

[69] [69]

Rhyth- mer: Ranking-based skill assessment with rhythm-aware trans- former,

Z. Luo, Y. Xiao, F. Yang, J. T. Zhou, and Z. Fang, “Rhyth- mer: Ranking-based skill assessment with rhythm-aware trans- former,” IEEE TCSVT, 2024

work page 2024

[70] [70]

Realtime multi-person 2d pose estimation using part affinity fields,

Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in CVPR, pp. 7291– 7299, 2017

work page 2017

[71] [71]

MediaPipe: A Framework for Building Perception Pipelines

C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee, et al. , “Mediapipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[72] [72]

Vitpose: Simple vi- sion transformer baselines for human pose estimation,

Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “Vitpose: Simple vi- sion transformer baselines for human pose estimation,” NeurIPS, vol. 35, pp. 38571–38584, 2022

work page 2022

[73] [73]

Skeleton-based action quality assess- ment via partially connected lstm with triplet losses,

X. Wang, J. Li, and H. Hu, “Skeleton-based action quality assess- ment via partially connected lstm with triplet losses,” in PRCV, pp. 220–232, Springer, 2022

work page 2022

[74] [74]

EGCN: an ensemble-based learning framework for exploring effective skeleton-based rehabilitation exercise assessment,

B. X. B. Yu, Y. Liu, X. Zhang, G. Chen, and K. C. C. Chan, “EGCN: an ensemble-based learning framework for exploring effective skeleton-based rehabilitation exercise assessment,” in IJCAI, pp. 3681–3687, 2022

work page 2022

[75] [75]

Egcn++: A new fusion strategy for ensemble learning in skeleton-based rehabilitation exercise assessment,

X. Bruce, Y. Liu, K. C. Chan, and C. W. Chen, “Egcn++: A new fusion strategy for ensemble learning in skeleton-based rehabilitation exercise assessment,” IEEE TP AMI, 2024

work page 2024

[76] [76]

A graph convolutional siamese network for the assessment and recognition of physical rehabili- tation exercises,

C. Li, X. Ling, and S. Xia, “A graph convolutional siamese network for the assessment and recognition of physical rehabili- tation exercises,” in ICANN, pp. 229–240, Springer, 2023

work page 2023

[77] [77]

Skeleton- based human action evaluation using graph convolutional net- work for monitoring alzheimer’s progression,

X. Bruce, Y. Liu, K. C. Chan, Q. Yang, and X. Wang, “Skeleton- based human action evaluation using graph convolutional net- work for monitoring alzheimer’s progression,” PR, vol. 119, p. 108095, 2021

work page 2021

[78] [78]

A deep learning framework for assessing physical rehabilitation exercises,

Y. Liao, A. Vakanski, and M. Xian, “A deep learning framework for assessing physical rehabilitation exercises,” IEEE TNSRE , vol. 28, no. 2, pp. 468–477, 2020

work page 2020

[79] [79]

Spatial temporal graph convolu- tional networks for skeleton-based action recognition,

S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolu- tional networks for skeleton-based action recognition,” in AAAI, vol. 32, 2018

work page 2018

[80] [80]

An attention-based adaptive spatial–temporal graph convolutional network for long-video ergonomic risk assessment,

C. Zhou, J. Zeng, L. Qiu, S. Wang, P . Liu, and J. Pan, “An attention-based adaptive spatial–temporal graph convolutional network for long-video ergonomic risk assessment,” Engineering Applications of Artificial Intelligence, vol. 131, p. 107780, 2024

work page 2024