A graph-enhanced 1.3B-parameter VLM achieves up to 16.24% gains and outperforms larger VLMs by integrating structured knowledge via GNNs.
A- okvqa: A benchmark for visual question answering us- ing world knowledge,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SmoGVLM: A Small, Graph-enhanced Vision-Language Model
A graph-enhanced 1.3B-parameter VLM achieves up to 16.24% gains and outperforms larger VLMs by integrating structured knowledge via GNNs.