MetaRA applies metamorphic testing to VQA tasks and shows that MLLM models exhibit sensitivity to linguistic perturbations and superficial visual cues not detected by conventional accuracy benchmarks.
Action-aware linguistic skeleton optimization network for non-autoregressive video captioning,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A new CoVQD-guided retrieval-augmented generation framework improves multimodal LLMs on visual question answering by using structured reasoning to retrieve better external knowledge.
citing papers explorer
-
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
MetaRA applies metamorphic testing to VQA tasks and shows that MLLM models exhibit sensitivity to linguistic perturbations and superficial visual cues not detected by conventional accuracy benchmarks.
-
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
A new CoVQD-guided retrieval-augmented generation framework improves multimodal LLMs on visual question answering by using structured reasoning to retrieve better external knowledge.