European conference on computer vision , pages=

A diagram is worth a dozen images , author= · 2016

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

cs.CV · 2026-05-20 · conditional · novelty 7.0

WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.

Deep Pre-Alignment for VLMs

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.

Large Vision-Language Models Get Lost in Attention

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.

DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA

cs.CL · 2026-04-29 · unverdicted · novelty 5.0

DIAGRAMS introduces a schema-driven annotation tool that proposes reasoning-level evidence regions for Diagram QA pairs and reports 85.39% precision and 75.30% recall against human final selections on six datasets.

citing papers explorer

Showing 4 of 4 citing papers.

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata cs.CV · 2026-05-20 · conditional · none · ref 126
WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
Deep Pre-Alignment for VLMs cs.CV · 2026-05-14 · unverdicted · none · ref 123
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
Large Vision-Language Models Get Lost in Attention cs.AI · 2026-05-07 · unverdicted · none · ref 97
In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.
DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA cs.CL · 2026-04-29 · unverdicted · none · ref 13
DIAGRAMS introduces a schema-driven annotation tool that proposes reasoning-level evidence regions for Diagram QA pairs and reports 85.39% precision and 75.30% recall against human final selections on six datasets.

European conference on computer vision , pages=

fields

years

verdicts

representative citing papers

citing papers explorer