V-MAGE introduces a game-based benchmark with dynamic ELO ranking to evaluate MLLMs on vision-centric interactive reasoning in continuous environments, showing gaps versus human performance.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
V-MAGE: A Game Evaluation Framework for Assessing Vision-Centric Capabilities in Multimodal Large Language Models
V-MAGE introduces a game-based benchmark with dynamic ELO ranking to evaluate MLLMs on vision-centric interactive reasoning in continuous environments, showing gaps versus human performance.