MobileVLM achieves on-par performance with much larger vision-language models on standard benchmarks while delivering state-of-the-art inference speeds of 21.5 tokens per second on Snapdragon 888 CPU and 65.3 on Jetson Orin GPU.
Make repvgg greater again: A quantization-aware approach
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
MobileVLM V2 shows that 1.7B and 3B parameter vision-language models can reach or exceed the performance of 3B and 7B+ models on common VLM benchmarks via targeted design and data improvements.
citing papers explorer
-
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
MobileVLM achieves on-par performance with much larger vision-language models on standard benchmarks while delivering state-of-the-art inference speeds of 21.5 tokens per second on Snapdragon 888 CPU and 65.3 on Jetson Orin GPU.
-
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
MobileVLM V2 shows that 1.7B and 3B parameter vision-language models can reach or exceed the performance of 3B and 7B+ models on common VLM benchmarks via targeted design and data improvements.