arXiv preprint arXiv:2407.12735 , year=

EchoSight: Advancing visual-language models with Wiki knowledge , author= · 2024 · arXiv 2407.12735

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

cs.CV · 2026-05-20 · conditional · novelty 7.0

WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.

WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition

cs.CV · 2026-03-10 · unverdicted · novelty 7.0

WikiCLIP delivers an efficient contrastive baseline for open-domain visual entity recognition that improves accuracy by 16% on OVEN unseen entities and runs nearly 100 times faster than leading generative models.

MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval

cs.IR · 2026-01-31 · unverdicted · novelty 6.0

MCERF delivers a 41.1% relative accuracy gain on the DesignQA benchmark by combining ColPali vision-language retrieval with four specialized reasoning modes and dynamic routing.

citing papers explorer

Showing 3 of 3 citing papers.

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata cs.CV · 2026-05-20 · conditional · none · ref 100
WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
WikiCLIP: An Efficient Contrastive Baseline for Open-domain Visual Entity Recognition cs.CV · 2026-03-10 · unverdicted · none · ref 43
WikiCLIP delivers an efficient contrastive baseline for open-domain visual entity recognition that improves accuracy by 16% on OVEN unseen entities and runs nearly 100 times faster than leading generative models.
MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval cs.IR · 2026-01-31 · unverdicted · none · ref 33
MCERF delivers a 41.1% relative accuracy gain on the DesignQA benchmark by combining ColPali vision-language retrieval with four specialized reasoning modes and dynamic routing.

arXiv preprint arXiv:2407.12735 , year=

fields

years

verdicts

representative citing papers

citing papers explorer