LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.
Point policy: Unifying observations and actions with key points for robot manipulation
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 5roles
background 2polarities
background 2representative citing papers
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
X-Diffusion adapts Ambient Diffusion to selectively train on noised human actions for cross-embodiment robot policies, yielding 16% higher average success rates than naive co-training or manual filtering across five real-world manipulation tasks.
KIL using foundation model keypoints reaches 75% success on five manipulation tasks, beating RGB (47%) but matching S2-diffusion (73%), with generalization tests on unseen objects via over 2000 real-world rollouts.
citing papers explorer
-
On the Generalization Capabilities, Design Choices and Limitations of Keypoint Imitation Learning
KIL using foundation model keypoints reaches 75% success on five manipulation tasks, beating RGB (47%) but matching S2-diffusion (73%), with generalization tests on unseen objects via over 2000 real-world rollouts.