Captioning images taken by people who are blind.arXiv preprint arXiv:2002.08565

Danna Gurari, Yinan Zhao, Meng Zhang, Nilavra Bhattacharya · 2002 · arXiv 2002.08565

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

GIT: A Generative Image-to-text Transformer for Vision and Language

cs.CV · 2022-05-27 · unverdicted · novelty 5.0

GIT achieves new state-of-the-art results on 12 vision-language benchmarks, including surpassing human performance on TextCaps, via a simplified single-encoder single-decoder transformer scaled on large pre-training data.

citing papers explorer

Showing 1 of 1 citing paper.

GIT: A Generative Image-to-text Transformer for Vision and Language cs.CV · 2022-05-27 · unverdicted · none · ref 11
GIT achieves new state-of-the-art results on 12 vision-language benchmarks, including surpassing human performance on TextCaps, via a simplified single-encoder single-decoder transformer scaled on large pre-training data.

Captioning images taken by people who are blind.arXiv preprint arXiv:2002.08565

fields

years

verdicts

representative citing papers

citing papers explorer