SkillSight: Efficient First-Person Skill Assessment with Gaze

· 2025 · cs.CV · arXiv 2511.19629

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Egocentric perception on smart glasses could transform how we learn new skills in the physical world, but automatic skill assessment remains a fundamental technical challenge. We introduce SkillSight for power-efficient skill assessment from first-person data. Central to our approach is the hypothesis that skill level is evident not only in how a person performs an activity (video), but also in how they direct their attention when doing so (gaze). Our two-stage framework first learns to jointly model gaze and egocentric video when predicting skill level, then distills a gaze-only student model. At inference, the student model requires only gaze input, drastically reducing power consumption by eliminating continuous video processing. Experiments on three datasets spanning cooking, music, and sports establish, for the first time, the valuable role of gaze in skill understanding across diverse real-world settings. Our SkillSight teacher model achieves state-of-the-art performance, while our gaze-only student variant maintains high accuracy using 73x less power than competing methods. These results pave the way for in-the-wild AI-supported skill learning.

representative citing papers

SkillSpotter: Pose-Aware Multi-View Skilled Action Detection and Grading in Ego-Exo Videos

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

SkillSpotter raises class-specific mAP from 12.40 to 21.82 and balanced accuracy to 60.40% on Ego-Exo4D by adding adaptive temporal suppression, gated pose fusion, and bidirectional cross-view attention to temporal action detectors.

citing papers explorer

Showing 1 of 1 citing paper after filters.

SkillSpotter: Pose-Aware Multi-View Skilled Action Detection and Grading in Ego-Exo Videos cs.CV · 2026-06-30 · unverdicted · none · ref 50 · internal anchor
SkillSpotter raises class-specific mAP from 12.40 to 21.82 and balanced accuracy to 60.40% on Ego-Exo4D by adding adaptive temporal suppression, gated pose fusion, and bidirectional cross-view attention to temporal action detectors.

SkillSight: Efficient First-Person Skill Assessment with Gaze

fields

years

verdicts

representative citing papers

citing papers explorer