pith. sign in

arxiv: 2412.11149 · v2 · pith:CDUTTEICnew · submitted 2024-12-15 · 💻 cs.CV

A Comprehensive Survey of Action Quality Assessment: Method and Benchmark

Pith reviewed 2026-05-23 06:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords action quality assessmentsurveytaxonomybenchmarkvideo-based methodsskeleton-based methodsmulti-modal approachescomputer vision
0
0 comments X

The pith

A modality-driven hierarchical taxonomy organizes AQA methods by input type while a unified benchmark standardizes comparisons for video-based approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the problem that AQA research uses varied datasets and evaluation settings that prevent direct method comparisons. It introduces a modality-driven hierarchical taxonomy that places methods into video-based, skeleton-based, and multi-modal groups and traces how representative models have developed within those groups. The work also builds a unified benchmark that merges several datasets under shared protocols to measure both accuracy and computational efficiency for video-based methods. Readers care because AQA supports practical uses in sports analysis, skill training, and healthcare where reliable scoring matters. The survey ends by mapping current trends, open challenges, and possible research paths.

Core claim

Existing AQA studies rely on heterogeneous datasets and evaluation settings that make systematic comparisons across methods difficult. The survey proposes a modality-driven hierarchical taxonomy that organizes methods into video-based, skeleton-based, and multi-modal approaches and analyzes the evolution of representative models. It further establishes a unified benchmark that integrates diverse datasets and applies standardized evaluation protocols to representative video-based AQA methods, allowing consistent comparison on accuracy and computational efficiency. The paper then examines emerging trends, identifies key challenges, and outlines future directions from near-term methodological进步

What carries the argument

The modality-driven hierarchical taxonomy that classifies AQA methods according to input modality, together with the unified benchmark that combines datasets and protocols for video-based methods.

If this is right

  • Video-based AQA methods can be compared directly on both accuracy and efficiency using the shared protocols.
  • Methodological changes across video, skeleton, and multi-modal categories become easier to track over time.
  • Key challenges in current AQA work are listed for focused attention in follow-on studies.
  • Future research directions are separated into near-term modeling advances and longer-term uses of new AI paradigms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The taxonomy could be updated to include emerging input types such as depth or wearable sensor streams.
  • The benchmark protocols might be reused or adapted to create similar standardized tests for skeleton-based or multi-modal methods.
  • Efficiency results from the benchmark could inform deployment choices in real-time applications like coaching tools.
  • The organization of methods by modality may reveal under-explored combinations that future work could test.

Load-bearing premise

The chosen representative methods and datasets, when placed under standardized protocols, still support valid cross-method comparisons despite differences among the original datasets.

What would settle it

Re-running the benchmark protocols on a fresh collection of datasets or with alternate evaluation metrics produces substantially reordered accuracy or efficiency rankings among the same methods.

Figures

Figures reproduced from arXiv: 2412.11149 by Hubert P. H. Shum, Kanglei Zhou, Liyuan Wang, Ruizhi Cai, Xiaohui Liang.

Figure 1
Figure 1. Figure 1: Annual statistics of AQA papers in CV and ML conferences or [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overall structure of our comprehensive survey. Our survey presents three core contributions: a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the common AQA framework, consisting of three [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Three typical fine-grained reasoning approaches in AQA. (a) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The procedural nature of actions in fine temporal modeling for [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of typical multi-modal AQA methods. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Computational performance comparison with selected baselines [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

Action Quality Assessment (AQA) aims to automatically evaluate how well human actions are performed and has been widely applied in sports analysis, skill assessment, and healthcare. However, AQA studies are often developed under heterogeneous datasets and evaluation settings, making systematic comparison across methods difficult. To address these challenges, we present a comprehensive survey of recent advances in AQA. In particular, we propose a modality-driven hierarchical taxonomy that organizes existing methods into video-based, skeleton-based, and multi-modal approaches, and analyze the methodological evolution of representative models. We further establish a unified benchmark for representative video-based AQA methods by integrating diverse datasets and standardized evaluation protocols, enabling consistent comparison in terms of both accuracy and computational efficiency. Finally, we analyze emerging research trends, identify key challenges in current AQA research, and outline future directions ranging from near-term methodological advances to longer-term opportunities enabled by emerging AI paradigms. The project web page can be found at https://ZhouKanglei.github.io/AQA-Survey.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper surveys recent advances in Action Quality Assessment (AQA), proposing a modality-driven hierarchical taxonomy that classifies methods into video-based, skeleton-based, and multi-modal categories while analyzing their methodological evolution. It further establishes a unified benchmark for representative video-based AQA methods through integration of diverse datasets under standardized evaluation protocols, enabling comparisons on accuracy and efficiency, and concludes with analysis of trends, challenges, and future directions.

Significance. If the taxonomy provides a clear organizing framework and the benchmark delivers reproducible, valid cross-method comparisons, the work would offer a useful reference point for a fragmented research area, potentially reducing redundant experimentation and highlighting efficiency-accuracy trade-offs in AQA.

major comments (2)
  1. [Abstract] Abstract and benchmark description: the claim that standardized protocols enable 'consistent comparison' across heterogeneous datasets is load-bearing for the central benchmark contribution, yet the manuscript provides no explicit description of normalization for differing scoring scales (absolute vs. relative) or domain shifts (sports vs. healthcare), leaving open the possibility that reported rankings reflect unification artifacts rather than intrinsic method properties.
  2. [Benchmark section] Benchmark integration section: without reported per-dataset score rescaling, subset selection criteria, or domain-adaptation checks, the unified evaluation protocol risks invalidating cross-dataset accuracy and efficiency comparisons; this directly affects the validity of the 'representative' method rankings presented.
minor comments (2)
  1. [Abstract] The project webpage URL is given but no details on whether benchmark code or dataset splits are released, which would strengthen reproducibility claims.
  2. [Taxonomy section] Taxonomy figure or table would benefit from explicit inclusion criteria for methods to avoid selection bias in the hierarchical organization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which correctly identify areas where the benchmark contribution requires greater transparency. We will revise the manuscript to supply the missing methodological details on normalization and protocol standardization.

read point-by-point responses
  1. Referee: [Abstract] Abstract and benchmark description: the claim that standardized protocols enable 'consistent comparison' across heterogeneous datasets is load-bearing for the central benchmark contribution, yet the manuscript provides no explicit description of normalization for differing scoring scales (absolute vs. relative) or domain shifts (sports vs. healthcare), leaving open the possibility that reported rankings reflect unification artifacts rather than intrinsic method properties.

    Authors: We agree that the abstract and benchmark description would be strengthened by explicit statements on these points. In revision we will add a concise paragraph to the abstract and a dedicated methods subsection that describes: (i) the score normalization applied to each dataset (min-max to [0,1] for absolute scores and rank-based conversion for relative scores), (ii) the criteria used to select comparable action subsets across sports and healthcare domains, and (iii) the absence of explicit domain-adaptation modules together with the rationale that cross-dataset comparison is performed only after per-dataset standardization. These additions will make the unification process reproducible and will allow readers to assess whether rankings reflect method properties. revision: yes

  2. Referee: [Benchmark section] Benchmark integration section: without reported per-dataset score rescaling, subset selection criteria, or domain-adaptation checks, the unified evaluation protocol risks invalidating cross-dataset accuracy and efficiency comparisons; this directly affects the validity of the 'representative' method rankings presented.

    Authors: The referee correctly notes that the current text does not report these implementation details. We will expand the benchmark integration section with: (1) the exact rescaling formulas and code-level implementation for each dataset, (2) the subset selection rules (e.g., action classes present in at least three datasets, minimum sample size thresholds), and (3) a short discussion of domain shift mitigation (or its absence) together with any post-hoc checks performed. If certain normalizations prove infeasible for particular datasets, we will state the limitation and qualify the corresponding rankings accordingly. revision: yes

Circularity Check

0 steps flagged

Survey paper with no derivation chain exhibits no circularity

full rationale

This is a literature survey that organizes existing AQA methods into a modality-driven taxonomy and re-evaluates representative video-based methods on integrated datasets under standardized protocols. No equations, fitted parameters, predictions, or uniqueness theorems appear in the manuscript. All claims rest on citations to external prior work rather than any self-contained derivation that reduces to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the work introduces no new free parameters, axioms, or invented entities; it compiles and structures existing published methods and datasets.

pith-pipeline@v0.9.0 · 5715 in / 1031 out tokens · 66546 ms · 2026-05-23T06:52:30.253529+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Parameter-Efficient Multi-View Proficiency Estimation: From Discriminative Classification to Generative Feedback

    cs.CV 2026-05 unverdicted novelty 5.0

    SkillFormer, PATS, and ProfVLM deliver state-of-the-art multi-view proficiency estimation on Ego-Exo4D with up to 20x fewer parameters by combining selective fusion, dense sampling, and generative feedback.

Reference graph

Works this paper leans on

181 extracted references · 181 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    End-to-end learning for action quality assessment,

    Y. Li, X. Chai, and X. Chen, “End-to-end learning for action quality assessment,” in PCM, pp. 125–134, Springer, 2018

  2. [2]

    A novel blind action quality assessment based on multi-headed gru network and attention mechanism,

    W. Sun, Y. Hu, B. Zhang, X. Chen, C. Hao, and Y. Gao, “A novel blind action quality assessment based on multi-headed gru network and attention mechanism,” in AIAHPC, vol. 12717, pp. 835–843, SPIE, 2023

  3. [3]

    Action quality assessment for asd behaviour evaluation,

    D. Zhang, D. Zhou, and H. Liu, “Action quality assessment for asd behaviour evaluation,” in ICMLC, pp. 483–488, IEEE, 2023

  4. [4]

    Towards unified surgical skill assessment,

    D. Liu, Q. Li, T. Jiang, Y. Wang, R. Miao, F. Shan, and Z. Li, “Towards unified surgical skill assessment,” in CVPR, pp. 9522– 9531, 2021

  5. [5]

    Video-based skill assessment for golf: Estimating golf handicap,

    C. K. Ingwersen, A. Xarles, A. Clap ´es, M. Madadi, J. N. Jensen, M. R. Hannemose, A. B. Dahl, and S. Escalera, “Video-based skill assessment for golf: Estimating golf handicap,” in International Workshop on Multimedia Content Analysis in Sports, pp. 31–39, 2023

  6. [6]

    Automated video assessment of human perfor- mance,

    A. S. Gordon, “Automated video assessment of human perfor- mance,” in AI-ED, vol. 2, p. 10, 1995

  7. [7]

    Learning to score figure skating sport videos,

    C. Xu, Y. Fu, B. Zhang, Z. Chen, Y.-G. Jiang, and X. Xue, “Learning to score figure skating sport videos,” IEEE TCSVT, vol. 30, no. 12, pp. 4578–4590, 2019

  8. [8]

    Learning time-aware features for action quality assessment,

    Y. Zhang, W. Xiong, and S. Mi, “Learning time-aware features for action quality assessment,” PRL, vol. 158, pp. 104–110, 2022

  9. [9]

    Eagle-eye: Extreme-pose action grader using detail bird’s-eye view,

    M. Nekoui, F. O. T. Cruz, and L. Cheng, “Eagle-eye: Extreme-pose action grader using detail bird’s-eye view,” in WACV, pp. 394– 402, 2021

  10. [10]

    A hierarchical joint training based replay-guided contrastive transformer for action quality assessment of figure skating,

    Y. LIU, X. CHENG, and T. IKENAGA, “A hierarchical joint training based replay-guided contrastive transformer for action quality assessment of figure skating,” IEICE Transactions on Fun- damentals of Electronics, Communications and Computer Sciences , 2024

  11. [11]

    The kimore dataset: Kinematic as- sessment of movement and clinical scores for remote monitoring of physical rehabilitation,

    M. Capecci, M. G. Ceravolo, F. Ferracuti, S. Iarlori, A. Monteriu, L. Romeo, and F. Verdini, “The kimore dataset: Kinematic as- sessment of movement and clinical scores for remote monitoring of physical rehabilitation,” TNSRE, vol. 27, no. 7, pp. 1436–1448, 2019. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17

  12. [12]

    Aifit: Automatic 3d human-interpretable feedback models for fitness training,

    M. Fieraru, M. Zanfir, S. C. Pirlea, V . Olaru, and C. Sminchisescu, “Aifit: Automatic 3d human-interpretable feedback models for fitness training,” in CVPR, pp. 9919–9928, 2021

  13. [13]

    A video-based augmented reality system for human-in-the-loop muscle strength assessment of juvenile dermatomyositis,

    K. Zhou, R. Cai, Y. Ma, Q. Tan, X. Wang, J. Li, H. P . Shum, F. W. Li, S. Jin, and X. Liang, “A video-based augmented reality system for human-in-the-loop muscle strength assessment of juvenile dermatomyositis,” IEEE TVCG , vol. 29, no. 5, pp. 2456–2466, 2023

  14. [14]

    Piano skills assessment,

    P . Parmar, J. Reddy, and B. Morris, “Piano skills assessment,” in MMSP, pp. 1–5, IEEE, 2021

  15. [15]

    Relative hidden markov models for video- based evaluation of motion skills in surgical training,

    Q. Zhang and B. Li, “Relative hidden markov models for video- based evaluation of motion skills in surgical training,” IEEE TP AMI, vol. 37, no. 6, pp. 1206–1218, 2014

  16. [16]

    Action recognition with improved trajectories,

    H. Wang and C. Schmid, “Action recognition with improved trajectories,” in ICCV, pp. 3551–3558, 2013

  17. [17]

    A 3-dimensional sift de- scriptor and its application to action recognition,

    P . Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift de- scriptor and its application to action recognition,” in ACM MM, pp. 357–360, 2007

  18. [18]

    Attractor-shape for dynamical analysis of hu- man movement: Applications in stroke rehabilitation and action recognition,

    V . Venkataraman, P . Turaga, N. Lehrer, M. Baran, T. Rikakis, and S. Wolf, “Attractor-shape for dynamical analysis of hu- man movement: Applications in stroke rehabilitation and action recognition,” in CVPRW, pp. 514–520, 2013

  19. [19]

    Infogcn++: Learn- ing representation by predicting the future for online skeleton- based action recognition,

    S. Chi, H.-g. Chi, Q. Huang, and K. Ramani, “Infogcn++: Learn- ing representation by predicting the future for online skeleton- based action recognition,” IEEE TP AMI, 2024

  20. [20]

    A survey of vision-based human action evaluation methods,

    Q. Lei, J.-X. Du, H.-B. Zhang, S. Ye, and D.-S. Chen, “A survey of vision-based human action evaluation methods,” Sensors, vol. 19, no. 19, p. 4129, 2019

  21. [21]

    A survey of video-based action quality assessment,

    S. Wang, D. Yang, P . Zhai, Q. Yu, T. Suo, Z. Sun, K. Li, and L. Zhang, “A survey of video-based action quality assessment,” in INSAI, pp. 1–9, IEEE, 2021

  22. [22]

    Vision- based human action quality assessment: A systematic review,

    J. Liu, H. Wang, K. Stawarz, S. Li, Y. Fu, and H. Liu, “Vision- based human action quality assessment: A systematic review,” Expert Systems with Applications, p. 125642, 2024

  23. [23]

    A comprehensive survey of continual learning: theory, method and application,

    L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: theory, method and application,” TP AMI, 2024

  24. [24]

    Hierarchical graph convolutional networks for action quality assessment,

    K. Zhou, Y. Ma, H. P . Shum, and X. Liang, “Hierarchical graph convolutional networks for action quality assessment,” IEEE TCSVT, vol. 33, no. 12, pp. 7749–7763, 2023

  25. [25]

    Group-aware contrastive regression for action quality assessment,

    X. Yu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, “Group-aware contrastive regression for action quality assessment,” in ICCV, pp. 7919–7928, 2021

  26. [26]

    Vision- language action knowledge learning for semantic-aware action quality assessment,

    H. Xu, X. Ke, Y. Li, R. Xu, H. Wu, X. Lin, and W. Guo, “Vision- language action knowledge learning for semantic-aware action quality assessment,” in ECCV, 2024

  27. [27]

    Narrative action evaluation with prompt-guided multimodal interaction,

    S. Zhang, S. Bai, G. Chen, L. Chen, J. Lu, J. Wang, and Y. Tang, “Narrative action evaluation with prompt-guided multimodal interaction,” in CVPR, pp. 18430–18439, 2024

  28. [28]

    Ricaˆ 2: Rubric-informed, calibrated assessment of actions,

    A. Majeedi, V . R. Gajjala, S. S. S. N. GNVV , and Y. Li, “Ricaˆ 2: Rubric-informed, calibrated assessment of actions,” arXiv preprint arXiv:2408.02138, 2024

  29. [29]

    Multimodal action quality assess- ment,

    L.-A. Zeng and W.-S. Zheng, “Multimodal action quality assess- ment,” IEEE TIP, 2024

  30. [30]

    Semi-supervised action quality assessment with self-supervised segment feature recovery,

    S.-J. Zhang, J.-H. Pan, J. Gao, and W.-S. Zheng, “Semi-supervised action quality assessment with self-supervised segment feature recovery,” IEEE TCSVT, vol. 32, no. 9, pp. 6017–6028, 2022

  31. [31]

    Semi-supervised teacher- reference-student architecture for action quality assessment,

    W. Yun, M. Qi, F. Peng, and H. Ma, “Semi-supervised teacher- reference-student architecture for action quality assessment,” arXiv preprint arXiv:2407.19675, 2024

  32. [32]

    Magr: Manifold-aligned graph regularization for con- tinual action quality assessment,

    K. Zhou, L. Wang, X. Zhang, H. P . Shum, F. W. Li, J. Li, and X. Liang, “Magr: Manifold-aligned graph regularization for con- tinual action quality assessment,” arXiv preprint arXiv:2403.04398, 2024

  33. [33]

    Continual action assessment via task-consistent score-discriminative feature distribution modeling,

    Y.-M. Li, L.-A. Zeng, J.-K. Meng, and W.-S. Zheng, “Continual action assessment via task-consistent score-discriminative feature distribution modeling,” IEEE TCSVT, 2024

  34. [34]

    Pecop: Parameter efficient continual pretraining for action quality as- sessment,

    A. Dadashzadeh, S. Duan, A. Whone, and M. Mirmehdi, “Pecop: Parameter efficient continual pretraining for action quality as- sessment,” in WACV, pp. 42–52, 2024

  35. [35]

    Techcoach: Towards technical keypoint-aware descriptive action coaching,

    Y.-M. Li, A.-L. Wang, K.-Y. Lin, T. Yu-Ming, L.-A. Zeng, J.-F. Hu, and W.-S. Zheng, “Techcoach: Towards technical keypoint-aware descriptive action coaching,” arXiv preprint arXiv:2411.17130 , 2024

  36. [36]

    Likert scoring with grade decoupling for long-term action assessment,

    A. Xu, L.-A. Zeng, and W.-S. Zheng, “Likert scoring with grade decoupling for long-term action assessment,” in CVPR, pp. 3232– 3241, 2022

  37. [37]

    What and how well you performed? a multitask learning approach to action quality assessment,

    P . Parmar and B. T. Morris, “What and how well you performed? a multitask learning approach to action quality assessment,” in CVPR, pp. 304–313, 2019

  38. [38]

    A figure skating jumping dataset for replay-guided action quality assessment,

    Y. Liu, X. Cheng, and T. Ikenaga, “A figure skating jumping dataset for replay-guided action quality assessment,” in ACM MM, pp. 2437–2445, 2023

  39. [39]

    Towards accurate and interpretable surgical skill assessment: A video-based method incorporat- ing recognized surgical gestures and skill levels,

    T. Wang, Y. Wang, and M. Li, “Towards accurate and interpretable surgical skill assessment: A video-based method incorporat- ing recognized surgical gestures and skill levels,” in MICCAI, pp. 668–678, Springer, 2020

  40. [40]

    Who’s better? who’s best? pairwise deep ranking for skill determination,

    H. Doughty, D. Damen, and W. Mayol-Cuevas, “Who’s better? who’s best? pairwise deep ranking for skill determination,” in CVPR, pp. 6057–6066, 2018

  41. [41]

    Which is the better teacher action? a new ranking model and dataset,

    M. Fang, X. Du, Q. Liu, Y. Zhou, Q. Liang, and S. Liu, “Which is the better teacher action? a new ranking model and dataset,” in ICASSP, pp. 7695–7699, IEEE, 2024

  42. [42]

    Imagenet clas- sification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet clas- sification with deep convolutional neural networks,” NeurIPS, vol. 25, 2012

  43. [43]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, pp. 770–778, 2016

  44. [44]

    Learning spatiotemporal features with 3d convolutional net- works,

    D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional net- works,” in ICCV, pp. 4489–4497, 2015

  45. [45]

    Learning spatio-temporal representa- tion with pseudo-3d residual networks,

    Z. Qiu, T. Yao, and T. Mei, “Learning spatio-temporal representa- tion with pseudo-3d residual networks,” in ICCV, pp. 5533–5541, 2017

  46. [46]

    Quo vadis, action recognition? a new model and the kinetics dataset,

    J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in CVPR, pp. 6299–6308, 2017

  47. [47]

    Video swin transformer,

    Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu, “Video swin transformer,” in CVPR, pp. 3202–3211, 2022

  48. [48]

    Tsa-net: Tube self-attention network for action quality assessment,

    S. Wang, D. Yang, P . Zhai, C. Chen, and L. Zhang, “Tsa-net: Tube self-attention network for action quality assessment,” in ACM MM, pp. 4902–4910, 2021

  49. [49]

    Action quality assessment with ignoring scene con- text,

    T. Nagai, S. Takeda, M. Matsumura, S. Shimizu, and S. Ya- mamoto, “Action quality assessment with ignoring scene con- text,” in ICIP, pp. 1189–1193, IEEE, 2021

  50. [50]

    Action assessment by joint relation graphs,

    J.-H. Pan, J. Gao, and W.-S. Zheng, “Action assessment by joint relation graphs,” in ICCV, pp. 6331–6340, 2019

  51. [51]

    Adaptive action assessment,

    J.-H. Pan, J. Gao, and W.-S. Zheng, “Adaptive action assessment,” IEEE TP AMI, vol. 44, no. 12, pp. 8779–8795, 2021

  52. [52]

    Self- supervised subaction parsing network for semi-supervised action quality assessment,

    K. Gedamu, Y. Ji, Y. Yang, J. Shao, and H. T. Shen, “Self- supervised subaction parsing network for semi-supervised action quality assessment,” IEEE TIP, 2024

  53. [53]

    Fine-grained spatio-temporal parsing network for action quality assessment,

    K. Gedamu, Y. Ji, Y. Yang, J. Shao, and H. T. Shen, “Fine-grained spatio-temporal parsing network for action quality assessment,” IEEE TIP, vol. 32, pp. 6386–6400, 2023

  54. [54]

    Surgical skill assessment via video semantic aggregation,

    Z. Li, L. Gu, W. Wang, R. Nakamura, and Y. Sato, “Surgical skill assessment via video semantic aggregation,” inMICCAI, pp. 410– 420, Springer, 2022

  55. [55]

    Hierarchical neurosymbolic ap- proach for comprehensive and explainable action quality assess- ment,

    L. Okamoto and P . Parmar, “Hierarchical neurosymbolic ap- proach for comprehensive and explainable action quality assess- ment,” in CVPRW, pp. 3204–3213, 2024

  56. [56]

    In- terpretable long-term action quality assessment,

    X. Dong, X. Liu, W. Li, A. Adeyemi-Ejeye, and A. Gilbert, “In- terpretable long-term action quality assessment,” arXiv preprint arXiv:2408.11687, 2024

  57. [57]

    Finediving: A fine-grained dataset for procedure-aware action quality assess- ment,

    J. Xu, Y. Rao, X. Yu, G. Chen, J. Zhou, and J. Lu, “Finediving: A fine-grained dataset for procedure-aware action quality assess- ment,” in CVPR, pp. 2949–2958, 2022

  58. [58]

    Action quality assessment with temporal parsing transformer,

    Y. Bai, D. Zhou, S. Zhang, J. Wang, E. Ding, Y. Guan, Y. Long, and J. Wang, “Action quality assessment with temporal parsing transformer,” in ECCV, pp. 422–438, Springer, 2022

  59. [59]

    Fineparser: A fine- grained spatio-temporal action parser for human-centric action quality assessment,

    J. Xu, S. Yin, G. Zhao, Z. Wang, and Y. Peng, “Fineparser: A fine- grained spatio-temporal action parser for human-centric action quality assessment,” in CVPR, pp. 14628–14637, 2024

  60. [60]

    Iris: Interpretable rubric-informed segmentation for action quality assessment,

    H. Matsuyama, N. Kawaguchi, and B. Y. Lim, “Iris: Interpretable rubric-informed segmentation for action quality assessment,” in ICIUI, pp. 368–378, 2023

  61. [61]

    Uncertainty-aware score distribution learning for action quality assessment,

    Y. Tang, Z. Ni, J. Zhou, D. Zhang, J. Lu, Y. Wu, and J. Zhou, “Uncertainty-aware score distribution learning for action quality assessment,” in CVPR, pp. 9839–9848, 2020

  62. [62]

    Uncertainty-driven action quality assessment,

    C. Zhou, Y. Huang, and H. Ling, “Uncertainty-driven action quality assessment,” arXiv preprint arXiv:2207.14513, 2022

  63. [63]

    Auto-encoding score distribution regression for action quality JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18 assessment,

    B. Zhang, J. Chen, Y. Xu, H. Zhang, X. Yang, and X. Geng, “Auto-encoding score distribution regression for action quality JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18 assessment,” Neural Computing and Applications , vol. 36, no. 2, pp. 929–942, 2024

  64. [64]

    Localization- assisted uncertainty score disentanglement network for action quality assessment,

    Y. Ji, L. Ye, H. Huang, L. Mao, Y. Zhou, and L. Gao, “Localization- assisted uncertainty score disentanglement network for action quality assessment,” in ACM MM, pp. 8590–8597, 2023

  65. [65]

    Cofinal: Enhancing action quality assessment with coarse-to-fine instruc- tion alignment,

    K. Zhou, J. Li, R. Cai, L. Wang, X. Zhang, and X. Liang, “Cofinal: Enhancing action quality assessment with coarse-to-fine instruc- tion alignment,” in IJCAI, 2024

  66. [66]

    Pairwise contrastive learning network for action quality assessment,

    M. Li, H.-B. Zhang, Q. Lei, Z. Fan, J. Liu, and J.-X. Du, “Pairwise contrastive learning network for action quality assessment,” in ECCV, pp. 457–473, Springer, 2022

  67. [67]

    Two-path target-aware contrastive regression for action quality assessment,

    X. Ke, H. Xu, X. Lin, and W. Guo, “Two-path target-aware contrastive regression for action quality assessment,” Information Sciences, vol. 664, p. 120347, 2024

  68. [68]

    Multi-stage contrastive regression for action quality assessment,

    Q. An, M. Qi, and H. Ma, “Multi-stage contrastive regression for action quality assessment,” in ICASSP, pp. 4110–4114, IEEE, 2024

  69. [69]

    Rhyth- mer: Ranking-based skill assessment with rhythm-aware trans- former,

    Z. Luo, Y. Xiao, F. Yang, J. T. Zhou, and Z. Fang, “Rhyth- mer: Ranking-based skill assessment with rhythm-aware trans- former,” IEEE TCSVT, 2024

  70. [70]

    Realtime multi-person 2d pose estimation using part affinity fields,

    Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in CVPR, pp. 7291– 7299, 2017

  71. [71]

    MediaPipe: A Framework for Building Perception Pipelines

    C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee, et al. , “Mediapipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019

  72. [72]

    Vitpose: Simple vi- sion transformer baselines for human pose estimation,

    Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “Vitpose: Simple vi- sion transformer baselines for human pose estimation,” NeurIPS, vol. 35, pp. 38571–38584, 2022

  73. [73]

    Skeleton-based action quality assess- ment via partially connected lstm with triplet losses,

    X. Wang, J. Li, and H. Hu, “Skeleton-based action quality assess- ment via partially connected lstm with triplet losses,” in PRCV, pp. 220–232, Springer, 2022

  74. [74]

    EGCN: an ensemble-based learning framework for exploring effective skeleton-based rehabilitation exercise assessment,

    B. X. B. Yu, Y. Liu, X. Zhang, G. Chen, and K. C. C. Chan, “EGCN: an ensemble-based learning framework for exploring effective skeleton-based rehabilitation exercise assessment,” in IJCAI, pp. 3681–3687, 2022

  75. [75]

    Egcn++: A new fusion strategy for ensemble learning in skeleton-based rehabilitation exercise assessment,

    X. Bruce, Y. Liu, K. C. Chan, and C. W. Chen, “Egcn++: A new fusion strategy for ensemble learning in skeleton-based rehabilitation exercise assessment,” IEEE TP AMI, 2024

  76. [76]

    A graph convolutional siamese network for the assessment and recognition of physical rehabili- tation exercises,

    C. Li, X. Ling, and S. Xia, “A graph convolutional siamese network for the assessment and recognition of physical rehabili- tation exercises,” in ICANN, pp. 229–240, Springer, 2023

  77. [77]

    Skeleton- based human action evaluation using graph convolutional net- work for monitoring alzheimer’s progression,

    X. Bruce, Y. Liu, K. C. Chan, Q. Yang, and X. Wang, “Skeleton- based human action evaluation using graph convolutional net- work for monitoring alzheimer’s progression,” PR, vol. 119, p. 108095, 2021

  78. [78]

    A deep learning framework for assessing physical rehabilitation exercises,

    Y. Liao, A. Vakanski, and M. Xian, “A deep learning framework for assessing physical rehabilitation exercises,” IEEE TNSRE , vol. 28, no. 2, pp. 468–477, 2020

  79. [79]

    Spatial temporal graph convolu- tional networks for skeleton-based action recognition,

    S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolu- tional networks for skeleton-based action recognition,” in AAAI, vol. 32, 2018

  80. [80]

    An attention-based adaptive spatial–temporal graph convolutional network for long-video ergonomic risk assessment,

    C. Zhou, J. Zeng, L. Qiu, S. Wang, P . Liu, and J. Pan, “An attention-based adaptive spatial–temporal graph convolutional network for long-video ergonomic risk assessment,” Engineering Applications of Artificial Intelligence, vol. 131, p. 107780, 2024

Showing first 80 references.