Structured multimodal attentions for textvqa.arXiv preprint arXiv:2006.00753,

Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton van den Hengel, Qi Wu · 2006 · arXiv 2006.00753

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

GIT: A Generative Image-to-text Transformer for Vision and Language

cs.CV · 2022-05-27 · unverdicted · novelty 5.0

GIT achieves new state-of-the-art results on 12 vision-language benchmarks, including surpassing human performance on TextCaps, via a simplified single-encoder single-decoder transformer scaled on large pre-training data.

citing papers explorer

Showing 1 of 1 citing paper.

GIT: A Generative Image-to-text Transformer for Vision and Language cs.CV · 2022-05-27 · unverdicted · none · ref 10
GIT achieves new state-of-the-art results on 12 vision-language benchmarks, including surpassing human performance on TextCaps, via a simplified single-encoder single-decoder transformer scaled on large pre-training data.

Structured multimodal attentions for textvqa.arXiv preprint arXiv:2006.00753,

fields

years

verdicts

representative citing papers

citing papers explorer