Masked autoencoders as spatiotemporal learners.Advances in neural information processing systems, 35:35946–35958,

Christoph Feichtenhofer, Yanghao Li, Kaiming He, et al

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

InstAP introduces instance-aware pre-training with a new dual-granularity dataset InstVL that improves both fine-grained instance retrieval and global video understanding over standard VLP baselines.

Recurrent Video Masked Autoencoders

cs.CV · 2025-12-15 · unverdicted · novelty 7.0

RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.

Masked Autoencoders with Limited Data: Does It Work? A Fine-Grained Bioacoustics Case Study

cs.SD · 2026-05-13 · conditional · novelty 6.0

In moderate-sized fine-grained bioacoustics, pretraining scale of masked autoencoders on diverse general audio dominates over domain-specific objectives or data curation for transfer performance.

citing papers explorer

Showing 3 of 3 citing papers.

InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding cs.CV · 2026-04-09 · unverdicted · none · ref 15
InstAP introduces instance-aware pre-training with a new dual-granularity dataset InstVL that improves both fine-grained instance retrieval and global video understanding over standard VLP baselines.
Recurrent Video Masked Autoencoders cs.CV · 2025-12-15 · unverdicted · none · ref 28
RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.
Masked Autoencoders with Limited Data: Does It Work? A Fine-Grained Bioacoustics Case Study cs.SD · 2026-05-13 · conditional · none · ref 13
In moderate-sized fine-grained bioacoustics, pretraining scale of masked autoencoders on diverse general audio dominates over domain-specific objectives or data curation for transfer performance.

Masked autoencoders as spatiotemporal learners.Advances in neural information processing systems, 35:35946–35958,

fields

years

verdicts

representative citing papers

citing papers explorer