Qwen-VL and its chat variant achieve new state-of-the-art results among similar-scale generalist models on image captioning, visual question answering, grounding, text reading, and dialog benchmarks via targeted vision integration and training.
If there is more than one text matching the same image, we select the longest one
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Qwen-VL and its chat variant achieve new state-of-the-art results among similar-scale generalist models on image captioning, visual question answering, grounding, text reading, and dialog benchmarks via targeted vision integration and training.