An image is worth 16x16 words: Trans- formers for image recognition at scale

Alexey Dosovitskiy · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CV · 2025-12-15 · unverdicted · novelty 7.0

RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.

citing papers explorer

Showing 1 of 1 citing paper.

Recurrent Video Masked Autoencoders cs.CV · 2025-12-15 · unverdicted · none · ref 24
RVM uses recurrent computation inside a masked autoencoder to learn video representations that match or exceed prior video and image models on classification, tracking, and dense spatial tasks with up to 30x better parameter efficiency.

An image is worth 16x16 words: Trans- formers for image recognition at scale

fields

years

verdicts

representative citing papers

citing papers explorer