EDITH combines egocentric vision and gaze from smart glasses with language in a hierarchical policy to let robots interpret brief nonverbal human intent and reduce user effort in interactive tasks.
Give me a straw- berry muffin, a cherry muffin, and an Oreo muffin
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Gaze2Act conditions VLA policies on mapped human gaze for precise object and interaction specification, reporting SOTA intent accuracy and success across 16 real-robot tasks on a Unitree G1 humanoid.
citing papers explorer
-
Hierarchical Policies from Verbal and Egocentric Human Signals for Natural Human-Robot Interaction
EDITH combines egocentric vision and gaze from smart glasses with language in a hierarchical policy to let robots interpret brief nonverbal human intent and reduce user effort in interactive tasks.
-
Gaze2Act: Gaze-Conditioned Vision-Language-Action Policies for Interactive Robot Manipulation
Gaze2Act conditions VLA policies on mapped human gaze for precise object and interaction specification, reporting SOTA intent accuracy and success across 16 real-robot tasks on a Unitree G1 humanoid.