Tapvid-3d: A benchmark for tracking any point in 3d

Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, Jo˜ao Carreira, Andrew Zisserman, Gabriel Brostow, Carl Doersch · 2024 · arXiv 2407.05921

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

cs.CV · 2025-05-22 · unverdicted · novelty 6.0

Multi-SpatialMLLM integrates depth perception, visual correspondence, and dynamic perception into MLLMs via a 27M-sample MultiSPA dataset and benchmark, yielding gains on multi-frame spatial tasks.

GenMatter: Perceiving Physical Objects with Generative Matter Models

cs.CV · 2026-04-24 · unverdicted · novelty 5.0

GenMatter is a generative hierarchical model that groups low-level motion and high-level features into particles and clusters representing independently moveable physical entities, validated across dot kinematograms, camouflaged objects, and RGB videos.

citing papers explorer

Showing 2 of 2 citing papers.

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models cs.CV · 2025-05-22 · unverdicted · none · ref 34
Multi-SpatialMLLM integrates depth perception, visual correspondence, and dynamic perception into MLLMs via a 27M-sample MultiSPA dataset and benchmark, yielding gains on multi-frame spatial tasks.
GenMatter: Perceiving Physical Objects with Generative Matter Models cs.CV · 2026-04-24 · unverdicted · none · ref 33
GenMatter is a generative hierarchical model that groups low-level motion and high-level features into particles and clusters representing independently moveable physical entities, validated across dot kinematograms, camouflaged objects, and RGB videos.

Tapvid-3d: A benchmark for tracking any point in 3d

fields

years

verdicts

representative citing papers

citing papers explorer