Augmenting multimodal LLMs with self-reflective tokens for knowledge- based visual question answering,

· 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

MetaRA applies metamorphic testing to VQA tasks and shows that MLLM models exhibit sensitivity to linguistic perturbations and superficial visual cues not detected by conventional accuracy benchmarks.

Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation

cs.CV · 2026-05-05 · unverdicted · novelty 4.0

A new CoVQD-guided retrieval-augmented generation framework improves multimodal LLMs on visual question answering by using structured reasoning to retrieve better external knowledge.

citing papers explorer

Showing 2 of 2 citing papers.

MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems cs.CV · 2026-05-19 · unverdicted · none · ref 24
MetaRA applies metamorphic testing to VQA tasks and shows that MLLM models exhibit sensitivity to linguistic perturbations and superficial visual cues not detected by conventional accuracy benchmarks.
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation cs.CV · 2026-05-05 · unverdicted · none · ref 7
A new CoVQD-guided retrieval-augmented generation framework improves multimodal LLMs on visual question answering by using structured reasoning to retrieve better external knowledge.

Augmenting multimodal LLMs with self-reflective tokens for knowledge- based visual question answering,

fields

years

verdicts

representative citing papers

citing papers explorer