EgoEngine transforms egocentric human videos into high-fidelity robot data enabling zero-shot visuomotor dexterous policy learning without real-robot demonstrations.
Egozero: Robot learning from smart glasses
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 11verdicts
UNVERDICTED 11roles
background 4representative citing papers
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.
HOWTransfer recovers 3D hand motion from video, localizes contact intervals via hand-object cues, generates multi-modal grasp hypotheses, and edits trajectories to produce diverse robot-executable motions achieving 86% success.
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.
EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.
X-Diffusion adapts Ambient Diffusion to selectively train on noised human actions for cross-embodiment robot policies, yielding 16% higher average success rates than naive co-training or manual filtering across five real-world manipulation tasks.
WARP is an offline retargeting method using a SEW geometric solver to produce consistent whole-body robot trajectories from human demonstrations for zero-shot mobile manipulation.
HumanEgo reports 92.5% average success on four real robot tasks using only 15-30 minutes of human video per task and zero robot data, with zero-shot transfer to new robots and cameras.
Human2Any transfers human video demonstrations to robots by representing tasks as object-object interactions and composing learned priors with robot-side planning.
citing papers explorer
-
EgoEngine: From Egocentric Human Videos to High-Fidelity Dexterous Robot Demonstrations
EgoEngine transforms egocentric human videos into high-fidelity robot data enabling zero-shot visuomotor dexterous policy learning without real-robot demonstrations.
-
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
-
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining
ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.
-
Hand-centric Human-to-Robot Trajectory Transfer from Video Demonstrations via Open-World Contact Localization
HOWTransfer recovers 3D hand motion from video, localizes contact intervals via hand-object cues, generates multi-modal grasp hypotheses, and edits trajectories to produce diverse robot-executable motions achieving 86% success.
-
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
-
ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration
ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.
-
EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World
EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.
-
X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations
X-Diffusion adapts Ambient Diffusion to selectively train on noised human actions for cross-embodiment robot policies, yielding 16% higher average success rates than naive co-training or manual filtering across five real-world manipulation tasks.
-
WARP: Whole-Body Retargeting for Learning from Offline Human Demonstrations
WARP is an offline retargeting method using a SEW geometric solver to produce consistent whole-body robot trajectories from human demonstrations for zero-shot mobile manipulation.
-
HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos
HumanEgo reports 92.5% average success on four real robot tasks using only 15-30 minutes of human video per task and zero robot data, with zero-shot transfer to new robots and cameras.
-
Human2Any: Human-to-Robot Transfer via Constraint-Aware Compositional Planning
Human2Any transfers human video demonstrations to robots by representing tasks as object-object interactions and composing learned priors with robot-side planning.