MM-AQA shows frontier VLMs rarely abstain on unanswerable multimodal questions, multi-agent setups improve abstention at an accuracy cost, and effective abstention needs training rather than prompting or extra agents.
How many claims are with the highest percentage of reasoning steps in the author’s proposed dataset?
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Knowing When Not to Answer: Evaluating Abstention in Multimodal Reasoning Systems
MM-AQA shows frontier VLMs rarely abstain on unanswerable multimodal questions, multi-agent setups improve abstention at an accuracy cost, and effective abstention needs training rather than prompting or extra agents.