AdaMMS merges heterogeneous MLLMs via architecture mapping, linear weight interpolation, and unsupervised hyper-parameter search, outperforming prior methods on vision-language benchmarks as the first such approach without labeled data.
Vizwiz grand challenge: Answering visual questions from blind people
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
MiniGPT-v2 adds unique task identifiers to a large language model so one system can perform image description, visual question answering, and visual grounding after three-stage training.
citing papers explorer
-
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
AdaMMS merges heterogeneous MLLMs via architecture mapping, linear weight interpolation, and unsupervised hyper-parameter search, outperforming prior methods on vision-language benchmarks as the first such approach without labeled data.
-
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
MiniGPT-v2 adds unique task identifiers to a large language model so one system can perform image description, visual question answering, and visual grounding after three-stage training.