pith. sign in

Masked autoencoders as spatiotemporal learners.Advances in neural information processing systems, 35:35946–35958

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.CV 2 cs.SD 1

years

2026 2 2025 1

clear filters

representative citing papers

Recurrent Video Masked Autoencoders

cs.CV · 2025-12-15 · unverdicted · novelty 7.0

RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding cs.CV · 2026-04-09 · unverdicted · none · ref 15

    InstAP introduces instance-aware pre-training with a new dual-granularity dataset InstVL that improves both fine-grained instance retrieval and global video understanding over standard VLP baselines.

  • Recurrent Video Masked Autoencoders cs.CV · 2025-12-15 · unverdicted · none · ref 28

    RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.