VISTAQA is a new benchmark for joint visual question answering correctness and pixel-level grounding, evaluated with the GROVE metric that uses per-sample geometric mean to require both dimensions to succeed.
Modeling context in referring expressions
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.
citing papers explorer
-
VISTAQA: Benchmarking Joint Visual Question Answering and Pixel-Level Evidence
VISTAQA is a new benchmark for joint visual question answering correctness and pixel-level grounding, evaluated with the GROVE metric that uses per-sample geometric mean to require both dimensions to succeed.
-
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.