hub

TAPIP3D: Tracking any point in persistent 3d geom- etry

· 2025 · arXiv 2504.14717

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

cs.CV · 2026-05-12 · unverdicted · novelty 8.0

TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.

Towards Visual Query Localization in the 3D World

cs.CV · 2026-05-02 · unverdicted · novelty 7.0

The authors release the 3DVQL benchmark for 3D multimodal visual query localization and show that a lift-and-attention fusion module outperforms prior fusion baselines on it.

Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations

cs.RO · 2026-05-21 · unverdicted · novelty 6.0

A framework learns invariant symbolic reward functions from few demonstrations that generalize zero-shot to variations in robotic manipulation tasks.

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

A training-free Spatio-Temporal Attention Chain framework accelerates 4D mesh generation 13x, improves quality, scales to 16x longer videos, and supports downstream tracking and camera estimation.

LongDPM: Overlap-Aware 4D Reconstruction from Long Monocular Videos

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

LongDPM introduces an overlap-aware chunk-based framework that registers and fuses local dynamic reconstructions to achieve coherent long-range 4D geometry and tracking from monocular video.

Velox: Learning Representations of 4D Geometry and Appearance

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.

SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SceneScribe-1M is a new dataset of 1 million videos with semantic text, camera parameters, dense depth, and consistent 3D point tracks to support monocular depth estimation, scene reconstruction, point tracking, and text-to-video synthesis.

DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass

cs.CV · 2025-12-15 · unverdicted · novelty 6.0

DePT3R performs joint dense point tracking and 3D reconstruction of dynamic scenes from multiple unposed images using a single neural network forward pass.

Syn4D: A Multiview Synthetic 4D Dataset

cs.CV · 2026-05-06 · unverdicted · novelty 5.0

Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.

citing papers explorer

Showing 10 of 10 citing papers.

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking cs.CV · 2026-05-12 · unverdicted · none · ref 84
TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.
Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes cs.CV · 2026-05-06 · unverdicted · none · ref 58
Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.
Towards Visual Query Localization in the 3D World cs.CV · 2026-05-02 · unverdicted · none · ref 33
The authors release the 3DVQL benchmark for 3D multimodal visual query localization and show that a lift-and-attention fusion module outperforms prior fusion baselines on it.
Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations cs.RO · 2026-05-21 · unverdicted · none · ref 43
A framework learns invariant symbolic reward functions from few demonstrations that generalize zero-shot to variations in robotic manipulation tasks.
Fast 4D Mesh Generation by Spatio-Temporal Attention Chains cs.CV · 2026-05-19 · unverdicted · none · ref 94
A training-free Spatio-Temporal Attention Chain framework accelerates 4D mesh generation 13x, improves quality, scales to 16x longer videos, and supports downstream tracking and camera estimation.
LongDPM: Overlap-Aware 4D Reconstruction from Long Monocular Videos cs.CV · 2026-05-17 · unverdicted · none · ref 44
LongDPM introduces an overlap-aware chunk-based framework that registers and fuses local dynamic reconstructions to achieve coherent long-range 4D geometry and tracking from monocular video.
Velox: Learning Representations of 4D Geometry and Appearance cs.CV · 2026-05-06 · unverdicted · none · ref 116
Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.
SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations cs.CV · 2026-04-09 · unverdicted · none · ref 68
SceneScribe-1M is a new dataset of 1 million videos with semantic text, camera parameters, dense depth, and consistent 3D point tracks to support monocular depth estimation, scene reconstruction, point tracking, and text-to-video synthesis.
DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass cs.CV · 2025-12-15 · unverdicted · none · ref 43
DePT3R performs joint dense point tracking and 3D reconstruction of dynamic scenes from multiple unposed images using a single neural network forward pass.
Syn4D: A Multiview Synthetic 4D Dataset cs.CV · 2026-05-06 · unverdicted · none · ref 131
Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.

TAPIP3D: Tracking any point in persistent 3d geom- etry

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer