A hardware-free dual-camera capture framework with ChArUco spatial unification and receding-horizon state alignment enables decoupled SE(3) manipulation and SE(2) base trajectories for diffusion policies, yielding 83.8% average success on four long-horizon household tasks.
Learning fine-grained bimanual manipulation with low-cost hardware,
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 2roles
background 1polarities
background 1representative citing papers
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
citing papers explorer
-
Mobile UMI: Cross-View Diffusion Policy with Decoupled Kinematics for Mobile Manipulation
A hardware-free dual-camera capture framework with ChArUco spatial unification and receding-horizon state alignment enables decoupled SE(3) manipulation and SE(2) base trajectories for diffusion policies, yielding 83.8% average success on four long-horizon household tasks.
-
A Survey on Vision-Language-Action Models for Embodied AI
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.