IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
Dense- matcher: Learning 3d semantic correspondence for category- level manipulation from a single demo
5 Pith papers cite this work. Polarity classification is still indexing.
5
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
DenseMarks learns a canonical 3D embedding space for human head images by training a Vision Transformer with contrastive loss on pairwise point tracks from in-the-wild videos, plus landmark and segmentation supervision.
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
SGSoft introduces a template-guided pipeline that fuses semantic and geometric features to learn dense correspondences across deformable 3D shapes with claimed SOTA generalization and real-time efficiency.