mR2AG: Multimodal Retrieval-Reflection- Augmented Generation for Knowledge-Based VQA //

Tao Zhang, Ziqi Zhang, Zongyang Ma, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Yuxuan Zhao, Zehua Xie, et al · 2024 · arXiv 2411.15041

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

representative citing papers

mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA

cs.CV · 2025-08-07 · unverdicted · novelty 7.0

mKG-RAG constructs multimodal KGs via MLLM-driven extraction and vision-text matching then applies dual-stage query-aware retrieval to achieve new state-of-the-art results on knowledge-based VQA.

WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.

QKVQA: Question-Focused Filtering for Knowledge-based VQA

cs.IR · 2026-01-20 · unverdicted · novelty 6.0

QKVQA proposes a question-focused filtering method with QFF and CDA modules that boosts accuracy by 3.2 points on Encyclopedic-VQA and 2.2 points on InfoSeek over prior state-of-the-art.

Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering

cs.CV · 2025-08-31 · unverdicted · novelty 6.0

PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.

Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation

cs.CL · 2025-05-28 · unverdicted · novelty 6.0

MoRE enables MLLMs to dynamically coordinate heterogeneous retrieval experts via Step-GRPO training, yielding over 7% average gains on open-domain QA benchmarks.

CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning

cs.CV · 2026-05-28 · unverdicted · novelty 3.0

CogniVerse is a proposed MMRAG framework that combines cognitive reflection for retrieval filtering, Riemannian manifold alignment plus spectral graphs for retrieval, and optimal transport loss for generation, claiming better accuracy, coherence, and lower latency than prior systems.

citing papers explorer

Showing 6 of 6 citing papers after filters.

mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA cs.CV · 2025-08-07 · unverdicted · none · ref 56
mKG-RAG constructs multimodal KGs via MLLM-driven extraction and vision-text matching then applies dual-stage query-aware retrieval to achieve new state-of-the-art results on knowledge-based VQA.
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering cs.CV · 2026-04-07 · unverdicted · none · ref 39
WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.
QKVQA: Question-Focused Filtering for Knowledge-based VQA cs.IR · 2026-01-20 · unverdicted · none · ref 37
QKVQA proposes a question-focused filtering method with QFF and CDA modules that boosts accuracy by 3.2 points on Encyclopedic-VQA and 2.2 points on InfoSeek over prior state-of-the-art.
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering cs.CV · 2025-08-31 · unverdicted · none · ref 55
PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.
Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation cs.CL · 2025-05-28 · unverdicted · none · ref 44
MoRE enables MLLMs to dynamically coordinate heterogeneous retrieval experts via Step-GRPO training, yielding over 7% average gains on open-domain QA benchmarks.
CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning cs.CV · 2026-05-28 · unverdicted · none · ref 121
CogniVerse is a proposed MMRAG framework that combines cognitive reflection for retrieval filtering, Riemannian manifold alignment plus spectral graphs for retrieval, and optimal transport loss for generation, claiming better accuracy, coherence, and lower latency than prior systems.

mR2AG: Multimodal Retrieval-Reflection- Augmented Generation for Knowledge-Based VQA //

fields

years

verdicts

representative citing papers

citing papers explorer