Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
CoTracker: It is Better to Track Together, October 2024
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.
ATM pre-trains models to predict trajectories of any points in videos, then uses those predictions to learn strong visuomotor policies from minimal action labels, beating baselines by 80% on 130+ tasks.
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
RoDyGS separates static and dynamic elements in monocular videos using Gaussian splatting with regularization and introduces the Kubric-MRig benchmark for pose-free dynamic novel view synthesis.
Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.
Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.
SWoMo decouples symbolic rule-based motion modeling via scene graphs from visual realism via diffusion models, trained through inverse pairing of real cataract surgery videos reconstructed in the simulator for sim-to-real translation.
citing papers explorer
-
Functionalization via Structure Completion and Motion Rectification
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
-
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.
-
Any-point Trajectory Modeling for Policy Learning
ATM pre-trains models to predict trajectories of any points in videos, then uses those predictions to learn strong visuomotor policies from minimal action labels, beating baselines by 80% on 130+ tasks.
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
-
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
-
RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos
RoDyGS separates static and dynamic elements in monocular videos using Gaussian splatting with regularization and introduces the Kubric-MRig benchmark for pose-free dynamic novel view synthesis.
-
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.
-
3D Reconstruction with Spatial Memory
Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.
-
SWoMo: Neuro-Symbolic World Model for Cataract Surgery Simulation
SWoMo decouples symbolic rule-based motion modeling via scene graphs from visual realism via diffusion models, trained through inverse pairing of real cataract surgery videos reconstructed in the simulator for sim-to-real translation.