mKG-RAG constructs multimodal KGs via MLLM-driven extraction and vision-text matching then applies dual-stage query-aware retrieval to achieve new state-of-the-art results on knowledge-based VQA.
mR2AG: Multimodal Retrieval-Reflection- Augmented Generation for Knowledge-Based VQA //
6 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 6representative citing papers
WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.
QKVQA proposes a question-focused filtering method with QFF and CDA modules that boosts accuracy by 3.2 points on Encyclopedic-VQA and 2.2 points on InfoSeek over prior state-of-the-art.
PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.
MoRE enables MLLMs to dynamically coordinate heterogeneous retrieval experts via Step-GRPO training, yielding over 7% average gains on open-domain QA benchmarks.
CogniVerse is a proposed MMRAG framework that combines cognitive reflection for retrieval filtering, Riemannian manifold alignment plus spectral graphs for retrieval, and optimal transport loss for generation, claiming better accuracy, coherence, and lower latency than prior systems.
citing papers explorer
-
mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
mKG-RAG constructs multimodal KGs via MLLM-driven extraction and vision-text matching then applies dual-stage query-aware retrieval to achieve new state-of-the-art results on knowledge-based VQA.
-
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering
WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.
-
QKVQA: Question-Focused Filtering for Knowledge-based VQA
QKVQA proposes a question-focused filtering method with QFF and CDA modules that boosts accuracy by 3.2 points on Encyclopedic-VQA and 2.2 points on InfoSeek over prior state-of-the-art.
-
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering
PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.
-
Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation
MoRE enables MLLMs to dynamically coordinate heterogeneous retrieval experts via Step-GRPO training, yielding over 7% average gains on open-domain QA benchmarks.
-
CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning
CogniVerse is a proposed MMRAG framework that combines cognitive reflection for retrieval filtering, Riemannian manifold alignment plus spectral graphs for retrieval, and optimal transport loss for generation, claiming better accuracy, coherence, and lower latency than prior systems.