CoTracker: It is Better to Track Together, October 2024

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht · 2023 · arXiv 2307.07635

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation

cs.RO · 2024-09-03 · conditional · novelty 7.0

ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.

Any-point Trajectory Modeling for Policy Learning

cs.RO · 2023-12-28 · conditional · novelty 7.0

ATM pre-trains models to predict trajectories of any points in videos, then uses those predictions to learn strong visuomotor policies from minimal action labels, beating baselines by 80% on 130+ tasks.

Zero-shot World Models Are Developmentally Efficient Learners

cs.AI · 2026-04-11 · unverdicted · novelty 6.0

A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

cs.RO · 2024-12-13 · conditional · novelty 6.0

Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.

RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos

cs.CV · 2024-12-04 · unverdicted · novelty 6.0

RoDyGS separates static and dynamic elements in monocular videos using Gaussian splatting with regularization and introduces the Kubric-MRig benchmark for pose-free dynamic novel view synthesis.

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation

cs.RO · 2024-09-24 · unverdicted · novelty 6.0

Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.

3D Reconstruction with Spatial Memory

cs.CV · 2024-08-28 · unverdicted · novelty 6.0

Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.

SWoMo: Neuro-Symbolic World Model for Cataract Surgery Simulation

cs.CV · 2026-05-15 · conditional · novelty 5.0

SWoMo decouples symbolic rule-based motion modeling via scene graphs from visual realism via diffusion models, trained through inverse pairing of real cataract surgery videos reconstructed in the simulator for sim-to-real translation.

citing papers explorer

Showing 9 of 9 citing papers.

Functionalization via Structure Completion and Motion Rectification cs.CV · 2026-05-18 · unverdicted · none · ref 154
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation cs.RO · 2024-09-03 · conditional · none · ref 144
ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.
Any-point Trajectory Modeling for Policy Learning cs.RO · 2023-12-28 · conditional · none · ref 20
ATM pre-trains models to predict trajectories of any points in videos, then uses those predictions to learn strong visuomotor policies from minimal action labels, beating baselines by 80% on 130+ tasks.
Zero-shot World Models Are Developmentally Efficient Learners cs.AI · 2026-04-11 · unverdicted · none · ref 53
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies cs.RO · 2024-12-13 · conditional · none · ref 79
Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos cs.CV · 2024-12-04 · unverdicted · none · ref 24
RoDyGS separates static and dynamic elements in monocular videos using Gaussian splatting with regularization and introduces the Kubric-MRig benchmark for pose-free dynamic novel view synthesis.
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation cs.RO · 2024-09-24 · unverdicted · none · ref 63
Gen2Act enables generalizable robot manipulation for unseen objects and novel motions by using zero-shot human video generation from web data to condition a policy trained on an order of magnitude less robot interaction data.
3D Reconstruction with Spatial Memory cs.CV · 2024-08-28 · unverdicted · none · ref 37
Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.
SWoMo: Neuro-Symbolic World Model for Cataract Surgery Simulation cs.CV · 2026-05-15 · conditional · none · ref 19
SWoMo decouples symbolic rule-based motion modeling via scene graphs from visual realism via diffusion models, trained through inverse pairing of real cataract surgery videos reconstructed in the simulator for sim-to-real translation.

CoTracker: It is Better to Track Together, October 2024

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer