RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.
Space-time correspondence as a contrastive random walk.Advances in neural information processing systems, 33:19545–19560
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
ChronoTrack enables effective long-term 3D single-object tracking in LiDAR by storing target features in compact learnable memory tokens regularized by temporal consistency and memory-cycle consistency losses, reaching SOTA accuracy at 42 FPS.
citing papers explorer
-
Recurrent Video Masked Autoencoders
RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.
-
Temporally Consistent Long-Term Memory for 3D Single Object Tracking
ChronoTrack enables effective long-term 3D single-object tracking in LiDAR by storing target features in compact learnable memory tokens regularized by temporal consistency and memory-cycle consistency losses, reaching SOTA accuracy at 42 FPS.