Gaze-guided hand-object interaction synthesis: Benchmark and method

· 2024 · arXiv 2403.16169

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

cs.CV · 2025-05-30 · unverdicted · novelty 8.0

Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.

SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

SynAgent enables generalizable cooperative humanoid manipulation by transferring skills from solo human-object interactions to multi-agent scenarios via interaction-preserving retargeting, single-agent pretraining with multi-agent PPO, and a conditional VAE generative policy.

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

cs.RO · 2025-07-16 · conditional · novelty 6.0

EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.

citing papers explorer

Showing 3 of 3 citing papers.

Reading Recognition in the Wild cs.CV · 2025-05-30 · unverdicted · none · ref 42
Introduces the Reading in the Wild dataset and a flexible transformer model using egocentric RGB, eye gaze, and head pose modalities to recognize reading activity in diverse real-world scenarios.
SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy cs.CV · 2026-04-20 · unverdicted · none · ref 34
SynAgent enables generalizable cooperative humanoid manipulation by transferring skills from solo human-object interactions to multi-agent scenarios via interaction-preserving retargeting, single-agent pretraining with multi-agent PPO, and a conditional VAE generative policy.
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos cs.RO · 2025-07-16 · conditional · none · ref 12
EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.

Gaze-guided hand-object interaction synthesis: Benchmark and method

fields

years

verdicts

representative citing papers

citing papers explorer