WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
arXiv preprint arXiv:2407.12735 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
WikiCLIP delivers an efficient contrastive baseline for open-domain visual entity recognition that improves accuracy by 16% on OVEN unseen entities and runs nearly 100 times faster than leading generative models.
MCERF delivers a 41.1% relative accuracy gain on the DesignQA benchmark by combining ColPali vision-language retrieval with four specialized reasoning modes and dynamic routing.
citing papers explorer
-
WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata
WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
-
WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition
WikiCLIP delivers an efficient contrastive baseline for open-domain visual entity recognition that improves accuracy by 16% on OVEN unseen entities and runs nearly 100 times faster than leading generative models.
-
MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval
MCERF delivers a 41.1% relative accuracy gain on the DesignQA benchmark by combining ColPali vision-language retrieval with four specialized reasoning modes and dynamic routing.