Qwen-VL and its chat variant achieve new state-of-the-art results among similar-scale generalist models on image captioning, visual question answering, grounding, text reading, and dialog benchmarks via targeted vision integration and training.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Qwen-VL and its chat variant achieve new state-of-the-art results among similar-scale generalist models on image captioning, visual question answering, grounding, text reading, and dialog benchmarks via targeted vision integration and training.