ChinaHeritaQA is a new bilingual VQA benchmark dataset with 2,279 images and 14,133 QA pairs for evaluating cultural reasoning abilities of VLMs on Chinese World Heritage sites across seven cognitive dimensions.
W orld C uisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.
citing papers explorer
-
ChinaHeritaQA: A Culturally-Grounded Visual Question Answering Dataset for World Heritage Sites in China
ChinaHeritaQA is a new bilingual VQA benchmark dataset with 2,279 images and 14,133 QA pairs for evaluating cultural reasoning abilities of VLMs on Chinese World Heritage sites across seven cognitive dimensions.