MCF-Proto adds a motion-centric local action frame and prototype parameterization to VLA models, inducing emergent geometric structure and improved robustness from standard demonstrations alone.
Distilled feature fields enable few-shot language-guided manipulation
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6representative citing papers
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
RecGen achieves state-of-the-art 3D multi-object scene reconstruction from sparse RGB-D views by combining compositional synthetic scene generation with strong 3D shape priors, outperforming SAM3D by 30%+ in shape quality and pose accuracy while using 80% fewer meshes.
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
DP3 uses compact 3D representations from sparse point clouds inside diffusion policies to learn generalizable visuomotor skills from few demonstrations, reporting 24% gains in simulation and 85% success on real robots.
A hybrid visual-motor imagery EEG decoder controls a robot for grasping and placement at 40% and 63% accuracy respectively, yielding 21% end-to-end task success in cue-free online use.
citing papers explorer
-
Beyond World-Frame Action Heads: Motion-Centric Action Frames for Vision-Language-Action Models
MCF-Proto adds a motion-centric local action frame and prototype parameterization to VLA models, inducing emergent geometric structure and improved robustness from standard demonstrations alone.
-
Action Images: End-to-End Policy Learning via Multiview Video Generation
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
-
Reconstruction by Generation: 3D Multi-Object Scene Reconstruction from Sparse Observations
RecGen achieves state-of-the-art 3D multi-object scene reconstruction from sparse RGB-D views by combining compositional synthetic scene generation with strong 3D shape priors, outperforming SAM3D by 30%+ in shape quality and pose accuracy while using 80% fewer meshes.
-
C3G: Learning Compact 3D Representations with 2K Gaussians
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
-
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
DP3 uses compact 3D representations from sparse point clouds inside diffusion policies to learn generalizable visuomotor skills from few demonstrations, reporting 24% gains in simulation and 85% success on real robots.
-
Robotic Grasping and Placement Controlled by EEG-Based Hybrid Visual and Motor Imagery
A hybrid visual-motor imagery EEG decoder controls a robot for grasping and placement at 40% and 63% accuracy respectively, yielding 21% end-to-end task success in cue-free online use.