M³-VQA is a benchmark showing that current MLLMs struggle with multi-entity multi-hop visual question answering without external knowledge but improve when given precise evidence.
InComputer vision–ECCV 2014: 13th European conference, zurich, Switzer- land, September 6-12
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
ReFinE is a Figma plugin that synthesizes contextualized design implications from HCI literature to provide actionable visual guidance for iterating on UI mockups.
citing papers explorer
-
M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering
M³-VQA is a benchmark showing that current MLLMs struggle with multi-entity multi-hop visual question answering without external knowledge but improve when given precise evidence.
-
ReFinE: Streamlining UI Mockup Iteration with Research Findings
ReFinE is a Figma plugin that synthesizes contextualized design implications from HCI literature to provide actionable visual guidance for iterating on UI mockups.