pith. sign in

Prob- ing inter-modality: Visual parsing with self-attention for vision-language pre-training.arXiv preprint arXiv:2106.13488, 2021a

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.CV 1

years

2022 1

verdicts

UNVERDICTED 1

representative citing papers

GIT: A Generative Image-to-text Transformer for Vision and Language

cs.CV · 2022-05-27 · unverdicted · novelty 5.0

GIT achieves new state-of-the-art results on 12 vision-language benchmarks, including surpassing human performance on TextCaps, via a simplified single-encoder single-decoder transformer scaled on large pre-training data.

citing papers explorer

Showing 1 of 1 citing paper.

  • GIT: A Generative Image-to-text Transformer for Vision and Language cs.CV · 2022-05-27 · unverdicted · none · ref 32

    GIT achieves new state-of-the-art results on 12 vision-language benchmarks, including surpassing human performance on TextCaps, via a simplified single-encoder single-decoder transformer scaled on large pre-training data.