Authors build a synthetic data generator and two-stage training pipeline for structured abstractive reasoning on multi-modal relational knowledge images, releasing STAR-64K and showing 3B/7B models outperforming GPT-4o.
Visualsem: a high-quality knowledge graph for vision and language.CoRR, abs/2008.09150,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images
Authors build a synthetic data generator and two-stage training pipeline for structured abstractive reasoning on multi-modal relational knowledge images, releasing STAR-64K and showing 3B/7B models outperforming GPT-4o.