MetaRA applies metamorphic testing to VQA tasks and shows that MLLM models exhibit sensitivity to linguistic perturbations and superficial visual cues not detected by conventional accuracy benchmarks.
Augmenting multimodal LLMs with self-reflective tokens for knowledge- based visual question answering,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A new CoVQD-guided retrieval-augmented generation framework improves multimodal LLMs on visual question answering by using structured reasoning to retrieve better external knowledge.
citing papers explorer
-
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
MetaRA applies metamorphic testing to VQA tasks and shows that MLLM models exhibit sensitivity to linguistic perturbations and superficial visual cues not detected by conventional accuracy benchmarks.
-
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
A new CoVQD-guided retrieval-augmented generation framework improves multimodal LLMs on visual question answering by using structured reasoning to retrieve better external knowledge.