Distilled feature fields enable few-shot language-guided manipulation

William Shen, Ge Yang, Alan Yu, Jansen Wong, Leslie Pack Kaelbling, Phillip Isola · 2023 · arXiv 2308.07931

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2 dataset 1

citation-polarity summary

background 2 use dataset 1

representative citing papers

Beyond World-Frame Action Heads: Motion-Centric Action Frames for Vision-Language-Action Models

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

MCF-Proto adds a motion-centric local action frame and prototype parameterization to VLA models, inducing emergent geometric structure and improved robustness from standard demonstrations alone.

Action Images: End-to-End Policy Learning via Multiview Video Generation

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.

Reconstruction by Generation: 3D Multi-Object Scene Reconstruction from Sparse Observations

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

RecGen achieves state-of-the-art 3D multi-object scene reconstruction from sparse RGB-D views by combining compositional synthetic scene generation with strong 3D shape priors, outperforming SAM3D by 30%+ in shape quality and pose accuracy while using 80% fewer meshes.

C3G: Learning Compact 3D Representations with 2K Gaussians

cs.CV · 2025-12-03 · unverdicted · novelty 6.0

C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

cs.RO · 2024-03-06 · unverdicted · novelty 6.0

DP3 uses compact 3D representations from sparse point clouds inside diffusion policies to learn generalizable visuomotor skills from few demonstrations, reporting 24% gains in simulation and 85% success on real robots.

Robotic Grasping and Placement Controlled by EEG-Based Hybrid Visual and Motor Imagery

cs.RO · 2026-03-03 · unverdicted · novelty 3.0

A hybrid visual-motor imagery EEG decoder controls a robot for grasping and placement at 40% and 63% accuracy respectively, yielding 21% end-to-end task success in cue-free online use.

citing papers explorer

Showing 6 of 6 citing papers.

Beyond World-Frame Action Heads: Motion-Centric Action Frames for Vision-Language-Action Models cs.AI · 2026-05-12 · unverdicted · none · ref 41
MCF-Proto adds a motion-centric local action frame and prototype parameterization to VLA models, inducing emergent geometric structure and improved robustness from standard demonstrations alone.
Action Images: End-to-End Policy Learning via Multiview Video Generation cs.CV · 2026-04-07 · unverdicted · none · ref 50
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
Reconstruction by Generation: 3D Multi-Object Scene Reconstruction from Sparse Observations cs.CV · 2026-04-29 · unverdicted · none · ref 45
RecGen achieves state-of-the-art 3D multi-object scene reconstruction from sparse RGB-D views by combining compositional synthetic scene generation with strong 3D shape priors, outperforming SAM3D by 30%+ in shape quality and pose accuracy while using 80% fewer meshes.
C3G: Learning Compact 3D Representations with 2K Gaussians cs.CV · 2025-12-03 · unverdicted · none · ref 56
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations cs.RO · 2024-03-06 · unverdicted · none · ref 58
DP3 uses compact 3D representations from sparse point clouds inside diffusion policies to learn generalizable visuomotor skills from few demonstrations, reporting 24% gains in simulation and 85% success on real robots.
Robotic Grasping and Placement Controlled by EEG-Based Hybrid Visual and Motor Imagery cs.RO · 2026-03-03 · unverdicted · none · ref 5
A hybrid visual-motor imagery EEG decoder controls a robot for grasping and placement at 40% and 63% accuracy respectively, yielding 21% end-to-end task success in cue-free online use.

Distilled feature fields enable few-shot language-guided manipulation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer