An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv, 2020

Alexey Dosovitskiy · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

A vision-language model is finetuned on 114k anonymized relational captions to embed images by their underlying structural correspondences instead of visible attributes.

citing papers explorer

Showing 1 of 1 citing paper.

Relational Visual Similarity cs.CV · 2025-12-08 · unverdicted · none · ref 36
A vision-language model is finetuned on 114k anonymized relational captions to embed images by their underlying structural correspondences instead of visible attributes.

An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv, 2020

fields

years

verdicts

representative citing papers

citing papers explorer