A graph-enhanced 1.3B-parameter VLM achieves up to 16.24% gains and outperforms larger VLMs by integrating structured knowledge via GNNs.
Visual instruction tuning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Introduces stage-aware sparsity via Visual Token Compressor for modality alignment and Layer Dynamic Skipper for instruction tuning to improve MLLM training efficiency.
citing papers explorer
-
SmoGVLM: A Small, Graph-enhanced Vision-Language Model
A graph-enhanced 1.3B-parameter VLM achieves up to 16.24% gains and outperforms larger VLMs by integrating structured knowledge via GNNs.
-
Improving MLLM Training Efficiency via Stage-Aware Sparsity
Introduces stage-aware sparsity via Visual Token Compressor for modality alignment and Layer Dynamic Skipper for instruction tuning to improve MLLM training efficiency.