Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation

Alexander Martin; Benjamin Van Durme; Chihsheng Jin; Dengjia Zhang; Eugene Yang; Kate Sanders; Reno Kriz; William Walden

arxiv: 2510.24870 · v2 · pith:PET6MDH4new · submitted 2025-10-28 · 💻 cs.CL · cs.CV· cs.IR

Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation

Alexander Martin , William Walden , Reno Kriz , Dengjia Zhang , Kate Sanders , Eugene Yang , Chihsheng Jin , Benjamin Van Durme This is my paper

classification 💻 cs.CL cs.CVcs.IR

keywords multimodalmirageevaluationgenerationinformationassessesautomaticintroduce

0 comments

read the original abstract

We introduce MiRAGE, an evaluation framework for retrieval-augmented generation (RAG) from multimodal sources. As audiovisual media becomes a prevalent source of information online, it is essential for RAG systems to integrate information from these sources into generation. However, existing evaluations for RAG are text-centric, limiting their applicability to multimodal settings. MiRAGE is a claim-centric approach to multimodal RAG evaluation, consisting of InfoF1, which assesses factuality and information coverage, and CiteF1, which assesses citation support and completeness. We show that, when applied by humans, MiRAGE strongly aligns with extrinsic judgments of output quality. We additionally introduce an automatic implementation of MiRAGE as well as multimodal variants of three prominent text-based RAG metrics -- ALCE, ARGUE, and RAGAS -- demonstrating the limitations of text-centric work and laying the groundwork for automatic evaluation. We release open-source implementations and outline evaluation methods for multimodal RAG.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering
cs.CV 2026-05 unverdicted novelty 5.0

CRAFT introduces a query-conditioned pipeline with dynamic keyframe selection, ASR, and a hybrid critic loop that achieves top scores on MAGMaR 2026 for grounded multi-video question answering.
Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage
cs.IR 2026-03 unverdicted novelty 5.0

Coverage-focused retrieval metrics correlate strongly with nugget coverage in RAG responses across text and multimodal benchmarks, supporting their use as performance proxies when retrieval and generation goals align.