iBOT achieves 82.3% linear probing accuracy and 87.8% fine-tuning accuracy on ImageNet-1K using masked image modeling with a jointly trained online tokenizer.
Self- supervised learning with swin transformers
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3roles
background 1polarities
background 1representative citing papers
BEiT pre-trains vision transformers via masked image modeling on visual tokens and reaches 83.2% ImageNet top-1 accuracy for the base model and 86.3% for the large model using only ImageNet-1K data.
SolarCHIP contrastively pretrains CNN and Vision Transformer backbones on SDO AIA-HMI data with multi-granularity objectives, achieving SOTA on cross-modal translation and flare classification especially in low-resource settings.
citing papers explorer
-
iBOT: Image BERT Pre-Training with Online Tokenizer
iBOT achieves 82.3% linear probing accuracy and 87.8% fine-tuning accuracy on ImageNet-1K using masked image modeling with a jointly trained online tokenizer.
-
BEiT: BERT Pre-Training of Image Transformers
BEiT pre-trains vision transformers via masked image modeling on visual tokens and reaches 83.2% ImageNet top-1 accuracy for the base model and 86.3% for the large model using only ImageNet-1K data.
-
Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records
SolarCHIP contrastively pretrains CNN and Vision Transformer backbones on SDO AIA-HMI data with multi-granularity objectives, achieving SOTA on cross-modal translation and flare classification especially in low-resource settings.