IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
Dense- matcher: Learning 3d semantic correspondence for category- level manipulation from a single demo
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
DenseMarks learns a canonical 3D embedding space for human head images by training a Vision Transformer with contrastive loss on pairwise point tracks from in-the-wild videos, plus landmark and segmentation supervision.
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
SGSoft introduces a template-guided pipeline that fuses semantic and geometric features to learn dense correspondences across deformable 3D shapes with claimed SOTA generalization and real-time efficiency.
citing papers explorer
-
IGen: Scalable Data Generation for Robot Learning from Open-World Images
IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.
-
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
DenseMarks learns a canonical 3D embedding space for human head images by training a Vision Transformer with contrastive loss on pairwise point tracks from in-the-wild videos, plus landmark and segmentation supervision.
-
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
-
SGSoft: Learning Fused Semantic-Geometric Features for 3D Shape Correspondence via Template-Guided Soft Signals
SGSoft introduces a template-guided pipeline that fuses semantic and geometric features to learn dense correspondences across deformable 3D shapes with claimed SOTA generalization and real-time efficiency.
- AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Afford Correspondence