If there is more than one text matching the same image, we select the longest one

Cleaning the text with certain unregular patterns For academic caption datasets, we remove pairs whose text contains the special tags in CC12M (Changpinyo et al · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

cs.CV · 2023-08-24 · unverdicted · novelty 6.0

Qwen-VL and its chat variant achieve new state-of-the-art results among similar-scale generalist models on image captioning, visual question answering, grounding, text reading, and dialog benchmarks via targeted vision integration and training.

citing papers explorer

Showing 1 of 1 citing paper.

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond cs.CV · 2023-08-24 · unverdicted · none · ref 8
Qwen-VL and its chat variant achieve new state-of-the-art results among similar-scale generalist models on image captioning, visual question answering, grounding, text reading, and dialog benchmarks via targeted vision integration and training.

If there is more than one text matching the same image, we select the longest one

fields

years

verdicts

representative citing papers

citing papers explorer