Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.
Gaze-guided hand-object interaction synthesis: Benchmark and method
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
SynAgent enables generalizable cooperative humanoid manipulation by transferring skills from solo human-object interactions to multi-agent scenarios via interaction-preserving retargeting, single-agent pretraining with multi-agent PPO, and a conditional VAE generative policy.
EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.
citing papers explorer
-
Reading Recognition in the Wild
Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.
-
SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy
SynAgent enables generalizable cooperative humanoid manipulation by transferring skills from solo human-object interactions to multi-agent scenarios via interaction-preserving retargeting, single-agent pretraining with multi-agent PPO, and a conditional VAE generative policy.
-
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.