Vqa: Visual question answering

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, Devi Parikh · 2015

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.

The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting

cs.CV · 2026-05-18 · conditional · novelty 7.0

MixCount provides a scalable synthetic dataset for mixed-object counting that improves state-of-the-art models on real benchmarks, cutting MAE by 20.14% on FSC-147 and 18.3% on PairTally.

Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

A graph-grounded Combined Road Substrate framework generates traceable QA pairs from road maps to improve small VLMs on compositional road reasoning tasks.

Discovering Failure Modes in Vision-Language Models using RL

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

An RL-based questioner agent adaptively generates queries to discover novel failure modes in VLMs without human intervention.

Enhancing Multimodal In-Context Learning via Inductive-Deductive Reasoning

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

A framework with similarity-based visual token compression, dynamic attention rebalancing, and explicit inductive-deductive chain-of-thought improves multimodal ICL performance across eight benchmarks for open-source VLMs.

Why Build an Assistant in Minecraft?

cs.AI · 2019-07-22 · unverdicted · novelty 4.0

A rationale is presented for developing an assistant in Minecraft to advance natural language understanding and dialogue learning.

citing papers explorer

Showing 6 of 6 citing papers.

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis cs.CV · 2026-05-21 · unverdicted · none · ref 2
VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.
The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting cs.CV · 2026-05-18 · conditional · none · ref 5
MixCount provides a scalable synthetic dataset for mixed-object counting that improves state-of-the-art models on real benchmarks, cutting MAE by 20.14% on FSC-147 and 18.3% on PairTally.
Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding cs.CV · 2026-05-20 · unverdicted · none · ref 32
A graph-grounded Combined Road Substrate framework generates traceable QA pairs from road maps to improve small VLMs on compositional road reasoning tasks.
Discovering Failure Modes in Vision-Language Models using RL cs.CV · 2026-04-06 · unverdicted · none · ref 1
An RL-based questioner agent adaptively generates queries to discover novel failure modes in VLMs without human intervention.
Enhancing Multimodal In-Context Learning via Inductive-Deductive Reasoning cs.CV · 2026-05-04 · unverdicted · none · ref 2
A framework with similarity-based visual token compression, dynamic attention rebalancing, and explicit inductive-deductive chain-of-thought improves multimodal ICL performance across eight benchmarks for open-source VLMs.
Why Build an Assistant in Minecraft? cs.AI · 2019-07-22 · unverdicted · none · ref 6
A rationale is presented for developing an assistant in Minecraft to advance natural language understanding and dialogue learning.

Vqa: Visual question answering

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer