pith. sign in

arxiv: 2606.17427 · v1 · pith:ZFEKEJ4Gnew · submitted 2026-06-16 · 💻 cs.CV · cs.HC

Impact of Hand Impairment and Occlusions on Hand Pose Estimation Accuracy in Augmented Reality Applications

Pith reviewed 2026-06-27 02:03 UTC · model grok-4.3

classification 💻 cs.CV cs.HC
keywords hand pose estimationaugmented realityHoloLens 2spinal cord injuryhand impairmentocclusionrehabilitationmixed reality
0
0 comments X

The pith

Hand pose estimation accuracy does not differ between people with cervical spinal cord injury and uninjured controls.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether mixed-reality hand tracking remains reliable when users have hand impairment from cervical spinal cord injury or when their hands are partially hidden by real objects during interaction tasks. Participants performed object manipulations while a multi-camera system supplied ground-truth 3D joint locations, and the study compared HoloLens 2 predictions against several state-of-the-art algorithms. Accuracy proved statistically equivalent across the two participant groups. Only tiny differences appeared between clear and opaque objects or between the HoloLens 2 and the best algorithms. These findings matter for designing rehabilitation applications that must work for impaired hands in everyday settings.

Core claim

The study found that 3D joint predictions from the HoloLens 2 and from WiLoR, HaMeR, WildHands, and MediaPipe showed no accuracy difference between the cSCI group and uninjured controls. Clear objects produced a 0.1 mm advantage over opaque objects, and WiLoR and HaMeR outperformed the HoloLens 2 by about 2 mm. Ground truth came from triangulation across a multi-camera setup while participants interacted with real objects.

What carries the argument

Direct comparison of 3D hand joint predictions from the HoloLens 2 and four pose estimation algorithms against multi-camera triangulated ground truth during real-object interactions.

If this is right

  • The HoloLens 2 can support hand rehabilitation applications that involve real-object interactions.
  • Existing pose estimation algorithms generalize to populations with hand impairment from cSCI.
  • The collected dataset can be used to improve future algorithms for impaired-hand tracking.
  • Small accuracy gains from clear objects suggest occlusion by real objects has limited impact.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Rehabilitation apps could safely incorporate physical objects without major loss of tracking reliability.
  • Similar accuracy testing could be extended to other hand-impairment causes such as stroke or arthritis.
  • Individual calibration may still be needed even if group-level accuracy is preserved.

Load-bearing premise

Ground truth estimates of 3D joint positions generated via triangulation from a multi-camera setup are sufficiently accurate to serve as the reference for all comparisons.

What would settle it

An independent measurement of the same hand poses using a different sensor technology such as electromagnetic markers that yields large systematic discrepancies with the multi-camera triangulation would falsify the accuracy claims.

Figures

Figures reproduced from arXiv: 2606.17427 by Cesar Marquez-Chin, Damian M. Manzone, Hardeep Singh, Jos\'e Zariffa, Mathew Szymanowski, Melissa Marquez-Chin, Olga Taran, Shuo Cai, Tammy Zeng.

Figure 1
Figure 1. Figure 1: Left Panel: Depiction of the experimental setup with the participant lifting a block while wearing the HoloLens 2 AR headset and surrounded by five cameras. Right Panel: Depiction of the participant’s view from the headset’s egocentric perspective. The white line displayed the bottom edge of the recording window or field of view of the HMD. the cameras offline (see Section II-D). A recording session was co… view at source ↗
Figure 2
Figure 2. Figure 2: Depiction of the 21 joints estimated by the HoloLens 2 and included [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Depiction of the opaque and clear versions of the block, credit card, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of comparisons between the 3D joint positions estimated for the ground truth and each algorithm for one frame when a uninjured participant [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The insignificant main effect of group. Individual participants are [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The significant main effect of algorithm. All comparisons between [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The significant main effect of transparency. Individual participants are [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The significant group by transparency by algorithm interaction. After correction for multiple comparisons, the only clear-opaque difference was in [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The correlations between error and Prehension Performance Subscores separated by algorithm. Each participant has a data point for average error and [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
read the original abstract

Mixed reality applications can be designed for hand rehabilitation. Augmented reality (AR) head mounted displays (HMDs) specifically allow for ecologically valid tasks because individuals can see their real environment and interact with real objects while receiving additional cues on the HMD. While these applications rely on accurate hand pose estimation, there is a gap in investigating the influence of hand impairment or occlusion from real-object interactions on pose estimation accuracy. Further, comparisons between AR HMD predictions and state-of-the-art pose estimation methods have not been established. The current study assessed pose estimation accuracy of the HoloLens 2 HMD and state-of-the-art pose estimation algorithms (WiLoR, HaMeR, WildHands, and MediaPipe) while individuals with cervical spinal cord injury (cSCI; n = 13, Neurological Level of Injury: C3-C6; American Spinal Injury Association Impairment Scale: A-D) and 15 uninjured controls interacted with clear and opaque objects. Ground truth estimates of 3D joint positions were generated via triangulation from a multi-camera setup. Pose estimation accuracy did not differ between the cSCI and uninjured control groups suggesting that 3D joint predictions from the HoloLens 2 and pose estimation algorithms can generalize to populations with hand impairment. Further, clear objects provided a small accuracy advantage over opaque objects (0.1 mm) and predictions from both WiLoR and HaMeR were slightly more accurate than the HoloLens 2 (2 mm). Overall, these results suggest that the HoloLens 2 may be viable for hand rehabilitation applications and the dataset generated can be used to refine pose estimation methods for hand-impaired populations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript reports an empirical comparison of 3D hand pose estimation accuracy for the HoloLens 2 HMD and four algorithms (WiLoR, HaMeR, WildHands, MediaPipe) in cSCI participants (n=13, C3-C6, AIS A-D) versus uninjured controls (n=15). Participants interacted with clear and opaque objects; ground truth 3D joint positions were obtained by multi-camera triangulation. The central claims are that accuracy did not differ between groups (suggesting generalization to hand impairment), clear objects yielded a 0.1 mm advantage, and WiLoR/HaMeR outperformed HoloLens 2 by 2 mm. The authors conclude that HoloLens 2 is viable for AR hand rehabilitation and release a dataset for impaired-hand refinement.

Significance. If the no-difference claim is substantiated, the work would be moderately significant for AR rehabilitation applications, providing the first direct evidence that current pose estimators generalize across cSCI-related hand impairment. The release of a dataset containing impaired-hand interactions is a concrete strength. However, the reported differences are extremely small and the absence of statistical support or ground-truth validation leaves the practical implications unclear.

major comments (3)
  1. [Abstract] Abstract: the claim that 'Pose estimation accuracy did not differ between the cSCI and uninjured control groups' is presented without statistical tests, p-values, confidence intervals, or per-group variance measures. This directly undermines the generalization conclusion.
  2. [Abstract] Abstract (and implied Methods): the multi-camera triangulation ground truth is treated as an unbiased reference for all comparisons, yet no per-group reprojection error, localization uncertainty, or visibility statistics are reported. cSCI participants may exhibit atypical postures and reduced visibility that systematically increase triangulation error relative to controls, which would invalidate the null-group-difference result.
  3. [Abstract] Abstract: the numerical differences (0.1 mm for object opacity, 2 mm for algorithm vs. HoloLens 2) are stated without effect sizes, practical significance thresholds for AR hand tracking, or participant-level error distributions, making it impossible to judge whether they are meaningful.
minor comments (1)
  1. [Abstract] Abstract: the error metric (e.g., mean per-joint position error) and any participant exclusion criteria are not defined, reducing reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions will be made to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'Pose estimation accuracy did not differ between the cSCI and uninjured control groups' is presented without statistical tests, p-values, confidence intervals, or per-group variance measures. This directly undermines the generalization conclusion.

    Authors: We agree the abstract should include statistical support for the no-difference claim. The full manuscript reports per-group means and standard deviations with overlapping distributions, and we performed group comparisons. We will revise the abstract to explicitly include p-values, confidence intervals, and variance measures. revision: yes

  2. Referee: [Abstract] Abstract (and implied Methods): the multi-camera triangulation ground truth is treated as an unbiased reference for all comparisons, yet no per-group reprojection error, localization uncertainty, or visibility statistics are reported. cSCI participants may exhibit atypical postures and reduced visibility that systematically increase triangulation error relative to controls, which would invalidate the null-group-difference result.

    Authors: This raises a valid methodological concern. We will add per-group reprojection error, localization uncertainty, and visibility statistics to the Methods and Results sections. These will be computed from the existing multi-camera data to confirm comparability of ground-truth quality across groups. revision: yes

  3. Referee: [Abstract] Abstract: the numerical differences (0.1 mm for object opacity, 2 mm for algorithm vs. HoloLens 2) are stated without effect sizes, practical significance thresholds for AR hand tracking, or participant-level error distributions, making it impossible to judge whether they are meaningful.

    Authors: We agree that effect sizes and context for practical significance are needed. We will incorporate effect sizes (e.g., Cohen's d), reference established AR hand-tracking error thresholds from the literature, and include participant-level error distributions in the revised abstract and results. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with external ground truth

full rationale

The paper reports an empirical accuracy comparison between hand-pose estimators (HoloLens 2, WiLoR, HaMeR, etc.) on cSCI vs. control participants, using multi-camera triangulation as an independent reference. No equations, fitted parameters, derivations, uniqueness theorems, or self-citations are invoked to support any claim. The central result (no group difference) is a direct statistical observation, not a quantity that reduces to its own inputs by construction. This matches the default case of a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical user study with no mathematical derivations, models, or new theoretical constructs; all claims rest on experimental measurements and standard accuracy comparisons.

pith-pipeline@v0.9.1-grok · 5873 in / 1022 out tokens · 40467 ms · 2026-06-27T02:03:16.534183+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 4 canonical work pages

  1. [1]

    The lifetime cost of spinal cord injury in Ontario, Canada: A population-based study from the perspective of the public health care payer,

    B. C.-F. Chanet al., “The lifetime cost of spinal cord injury in Ontario, Canada: A population-based study from the perspective of the public health care payer,”The Journal of Spinal Cord Medicine, vol. 42, no. 2, pp. 184–193, Mar. 2019

  2. [2]

    Targeting Recovery: Priorities of the Spinal Cord- Injured Population,

    K. D. Anderson, “Targeting Recovery: Priorities of the Spinal Cord- Injured Population,”Journal of Neurotrauma, vol. 21, no. 10, pp. 1371– 1383, Oct. 2004

  3. [3]

    Gaps in recovery priorities between individuals with spinal cord injury and healthcare professionals,

    S. Samejimaet al., “Gaps in recovery priorities between individuals with spinal cord injury and healthcare professionals,”npj Health Systems, vol. 3, no. 19, pp. 1–8, Feb. 2026

  4. [4]

    Efficacy of Virtual Reality Rehabilitation after Spinal Cord Injury: A Systematic Review,

    A. V . L. De Ara ´ujoet al., “Efficacy of Virtual Reality Rehabilitation after Spinal Cord Injury: A Systematic Review,”BioMed Research International, vol. 2019, pp. 1–15, Nov. 2019

  5. [5]

    Virtual Reality as a Therapeutic Tool in Spinal Cord Injury Rehabilitation: A Comprehensive Evaluation and Systematic Review,

    M. Scaliseet al., “Virtual Reality as a Therapeutic Tool in Spinal Cord Injury Rehabilitation: A Comprehensive Evaluation and Systematic Review,”Journal of Clinical Medicine, vol. 13, no. 5429, pp. 1–16, Sep. 2024

  6. [6]

    Augmented reality for hand function rehabili- tation: Assessing perceptions of feasibility and meaningfulness among individuals with cervical spinal cord injury,

    D. M. Manzoneet al., “Augmented reality for hand function rehabili- tation: Assessing perceptions of feasibility and meaningfulness among individuals with cervical spinal cord injury,”Research Square, 2026

  7. [7]

    The use of augmented reality for reha- bilitation after stroke: a narrative review,

    C. Gorman and L. Gustafsson, “The use of augmented reality for reha- bilitation after stroke: a narrative review,”Disability and Rehabilitation: Assistive Technology, vol. 17, no. 4, pp. 409–417, May 2022

  8. [8]

    Effectiveness of Augmented Reality in Stroke Rehabilitation: A Meta-Analysis,

    H. L. Phanet al., “Effectiveness of Augmented Reality in Stroke Rehabilitation: A Meta-Analysis,”Applied Sciences, vol. 12, no. 1848, pp. 1–17, Feb. 2022

  9. [9]

    Electrical Stimulation Exercise for People with Spinal Cord Injury: A Healthcare Provider Perspective,

    D. R. Dolbowet al., “Electrical Stimulation Exercise for People with Spinal Cord Injury: A Healthcare Provider Perspective,”Journal of Clinical Medicine, vol. 12, no. 3150, pp. 1–14, Apr. 2023

  10. [10]

    Grasp Analysis in the Home Environment as a Measure of Hand Function After Cervical Spinal Cord Injury,

    M. Doustyet al., “Grasp Analysis in the Home Environment as a Measure of Hand Function After Cervical Spinal Cord Injury,”Neu- rorehabilitation and Neural Repair, vol. 37, no. 7, pp. 466–474, Jul. 2023. 10

  11. [11]

    Personalized video-based hand taxonomy using egocentric video in the wild,

    M. Dousty, D. J. Fleet, and J. Zariffa, “Personalized video-based hand taxonomy using egocentric video in the wild,”IEEE Journal of Biomedical and Health Informatics, vol. 29, no. 9, pp. 6214–6225, 2024

  12. [12]

    Does Task-Specific Training Improve Upper Limb Performance in Daily Life Poststroke?

    K. J. Waddellet al., “Does Task-Specific Training Improve Upper Limb Performance in Daily Life Poststroke?”Neurorehabilitation and Neural Repair, vol. 31, no. 3, pp. 290–300, Mar. 2017

  13. [13]

    Activity-Based Therapy: From Basic Science to Clinical Application for Recovery After Spinal Cord Injury,

    A. L. Behrman, E. M. Ardolino, and S. J. Harkema, “Activity-Based Therapy: From Basic Science to Clinical Application for Recovery After Spinal Cord Injury,”Journal of Neurologic Physical Therapy, vol. 41, pp. S39–S45, Jul. 2017

  14. [14]

    Challenges and trends in egocentric vision: A survey,

    X. Liet al., “Challenges and trends in egocentric vision: A survey,” Machine Intelligence Research, vol. 23, no. 1, pp. 1–33, Feb. 2026

  15. [15]

    A methodological framework to assess the accuracy of virtual reality hand-tracking systems: A case study with the Meta Quest 2,

    D. Abdlkarimet al., “A methodological framework to assess the accuracy of virtual reality hand-tracking systems: A case study with the Meta Quest 2,”Behavior Research Methods, vol. 56, no. 2, pp. 1052–1063, Feb. 2024

  16. [16]

    Dynamic Pose Tracking Performance Evaluation of HTC Vive Virtual Reality System,

    M. S. Ikbal, V . Ramadoss, and M. Zoppi, “Dynamic Pose Tracking Performance Evaluation of HTC Vive Virtual Reality System,”IEEE Access, vol. 9, pp. 3798–3815, Dec. 2021

  17. [17]

    Accuracy Evaluation of Touch Tasks in Commodity Virtual and Augmented Reality Head-Mounted Displays,

    D. Schneideret al., “Accuracy Evaluation of Touch Tasks in Commodity Virtual and Augmented Reality Head-Mounted Displays,” inProceed- ings of the 2021 ACM Symposium on Spatial User Interaction, Nov. 2021, pp. 1–11

  18. [18]

    Accuracy and repeatability tests on hololens 2 and htc vive,

    I. Soareset al., “Accuracy and repeatability tests on hololens 2 and htc vive,”Multimodal Technologies and Interaction, vol. 5, no. 47, pp. 1–14, Aug. 2021

  19. [19]

    Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment,

    J. Bertolasiet al., “Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment,”Virtual Worlds, vol. 4, no. 31, pp. 1– 18, Jul. 2025

  20. [20]

    Validation of the Comprehensive Augmented Reality Testing Platform to Quantify Parkinson’s Disease Fine Motor Perfor- mance,

    A. Bazyket al., “Validation of the Comprehensive Augmented Reality Testing Platform to Quantify Parkinson’s Disease Fine Motor Perfor- mance,”Journal of Clinical Medicine, vol. 14, no. 3966, pp. 1–16, Jun. 2025

  21. [21]

    Quantitative Comparison of Hand Kinematics Mea- sured with a Markerless Commercial Head-Mounted Display and a Marker-Based Motion Capture System in Stroke Survivors,

    A. Casileet al., “Quantitative Comparison of Hand Kinematics Mea- sured with a Markerless Commercial Head-Mounted Display and a Marker-Based Motion Capture System in Stroke Survivors,”Sensors, vol. 23, no. 7906, pp. 1–13, Sep. 2023

  22. [22]

    Principles of Experience-Dependent Neural Plasticity: Implications for Rehabilitation After Brain Damage,

    J. A. Kleim and T. A. Jones, “Principles of Experience-Dependent Neural Plasticity: Implications for Rehabilitation After Brain Damage,” Journal of Speech, Language, and Hearing Research, vol. 51, no. 1, pp. S225–S239, Feb. 2008

  23. [23]

    Partially Occluded Hands:,

    B. Myanganbayaret al., “Partially Occluded Hands:,” inComputer Vision – ACCV 2018, C. Jawaharet al., Eds., 2019, pp. 85–98

  24. [24]

    Wilor: End-to-end 3d hand localization and reconstruction in-the-wild,

    R. A. Potamiaset al., “Wilor: End-to-end 3d hand localization and reconstruction in-the-wild,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 242–12 254

  25. [25]

    Reconstructing hands in 3d with transformers,

    G. Pavlakoset al., “Reconstructing hands in 3d with transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 9826–9836

  26. [26]

    Benchmarks and challenges in pose estimation for egocentric hand interactions with objects,

    Z. Fanet al., “Benchmarks and challenges in pose estimation for egocentric hand interactions with objects,” inEuropean Conference on Computer Vision, 2024, pp. 428–448

  27. [27]

    Evaluating hololens 2 pose estimation accuracy for individuals with cervical spinal cord injury,

    D. M. Manzoneet al., “Evaluating hololens 2 pose estimation accuracy for individuals with cervical spinal cord injury,” in2026 48th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Toronto, ON, CA, July 2026, to appear

  28. [28]

    Hololens 2 research mode as a tool for computer vision research,

    D. Ungureanuet al., “Hololens 2 research mode as a tool for computer vision research,”arXiv preprint arXiv:2008.11239, 2020

  29. [29]

    Automatic generation and detection of highly reliable fiducial markers under occlusion,

    S. Garrido-Juradoet al., “Automatic generation and detection of highly reliable fiducial markers under occlusion,”Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, Jun. 2014

  30. [30]

    The Graded Redefined Assessment of Strength Sensibility and Prehension: Reliability and Validity,

    S. Kalsi-Ryanet al., “The Graded Redefined Assessment of Strength Sensibility and Prehension: Reliability and Validity,”Journal of Neuro- trauma, vol. 29, no. 5, pp. 905–914, Mar. 2012

  31. [31]

    3-Dimensional printing in rehabilitation: feasibility of printing an upper extremity gross motor function assessment tool,

    N. Kapadiaet al., “3-Dimensional printing in rehabilitation: feasibility of printing an upper extremity gross motor function assessment tool,” BioMedical Engineering OnLine, vol. 20, no. 1, Dec. 2021

  32. [32]

    Preliminary evaluation of the reliability and validity of the 3D printed Toronto Rehabilitation Institute-Hand Function Test in individuals with spinal cord injury,

    N. Kapadiaet al., “Preliminary evaluation of the reliability and validity of the 3D printed Toronto Rehabilitation Institute-Hand Function Test in individuals with spinal cord injury,”The Journal of Spinal Cord Medicine, vol. 44, pp. S225–S233, Sep. 2021

  33. [34]

    Deep High-Resolution Representation Learning for Visual Recognition,

    J. Wanget al., “Deep High-Resolution Representation Learning for Visual Recognition,” Mar. 2020, arXiv:1908.07919 [cs]

  34. [35]

    Benchmarking 2d egocentric hand pose datasets,

    O. Taran, D. M. Manzone, and J. Zariffa, “Benchmarking 2d egocentric hand pose datasets,”IEEE Access, vol. 13, pp. 92 445–92 456, May 2025

  35. [36]

    Mediapipe hands: On-device real-time hand tracking,

    F. Zhanget al., “Mediapipe hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020

  36. [37]

    3d hand pose estimation in everyday egocentric images,

    A. Prakashet al., “3d hand pose estimation in everyday egocentric images,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 183–202

  37. [38]

    Generalized procrustes analysis,

    J. C. Gower, “Generalized procrustes analysis,”Psychometrika, vol. 40, no. 1, pp. 33–51, Mar. 1975

  38. [39]

    End-to-End Recovery of Human Shape and Pose,

    A. Kanazawaet al., “End-to-End Recovery of Human Shape and Pose,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, Jun. 2018, pp. 7122–7131

  39. [40]

    Applications of Pose Estimation in Human Health and Performance across the Lifespan,

    J. Stenumet al., “Applications of Pose Estimation in Human Health and Performance across the Lifespan,”Sensors, vol. 21, no. 21, Nov. 2021

  40. [41]

    Hot3d: Hand and object tracking in 3d from egocen- tric multi-view videos,

    P. Banerjeeet al., “Hot3d: Hand and object tracking in 3d from egocen- tric multi-view videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 7061–7071

  41. [42]

    Rehabhand—a new physical rehabilitation training dataset: Construction and benchmark performances of the relevant hand tasks,

    S. H. Nguyenet al., “Rehabhand—a new physical rehabilitation training dataset: Construction and benchmark performances of the relevant hand tasks,”IEEE Access, vol. 13, pp. 102 373–102 389, Jun. 2025

  42. [43]

    H2O: Two Hands Manipulating Objects for First Person Interaction Recognition,

    T. Kwonet al., “H2O: Two Hands Manipulating Objects for First Person Interaction Recognition,” Aug. 2021, arXiv:2104.11181 [cs]