Do Multimodal RAG Systems Leak Data? A Comprehensive Evaluation of Membership Inference and Image Caption Retrieval Attacks

Ali Al-Lawati; Suhang Wang

arxiv: 2601.17644 · v3 · submitted 2026-01-25 · 💻 cs.CR · cs.AI

Do Multimodal RAG Systems Leak Data? A Comprehensive Evaluation of Membership Inference and Image Caption Retrieval Attacks

Ali Al-Lawati , Suhang Wang This is my paper

Pith reviewed 2026-05-16 11:47 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords multimodal RAGprivacy risksmembership inferencedata leakageimage caption retrievalretrieval-augmented generationvision-language models

0 comments

The pith

Multimodal RAG systems leak private image data and captions through standard prompting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs an empirical study of privacy risks in multimodal Retrieval-Augmented Generation pipelines used for vision tasks such as visual question answering. It implements membership inference attacks that check whether a specific image belongs to the private dataset and caption retrieval attacks that extract associated metadata when the image is present. A sympathetic reader would care because mRAG systems deliberately connect private visual datasets to improve performance, yet the same connection creates a direct path for data exposure. The study shows these leaks occur via ordinary model prompts without needing specialized attack techniques. The results point to the need for privacy safeguards in mRAG deployments.

Core claim

Through a case study on mRAG pipelines, the authors show that standard prompting suffices to determine whether a visual asset is included in the system and, when it is, to leak its related metadata such as the caption.

What carries the argument

Membership inference and image caption retrieval attacks executed via standard model prompting on the mRAG pipeline.

Load-bearing premise

Attacks carried out with ordinary prompting represent typical privacy threats in real mRAG systems that lack defenses or special configurations.

What would settle it

A controlled mRAG deployment with known private images where repeated standard prompts fail to identify membership or retrieve any correct caption metadata.

read the original abstract

The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g., visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets and improve model performance, it risks the leakage of private information from these datasets. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to determine whether a visual asset (e.g., image) is included in the mRAG, and, if present, to leak the metadata (e.g., caption) related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy. Our code is published online: https://github.com/aliwister/mrag-attack-eval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

mRAG leaks image membership and captions under plain prompting, but the evaluation skips defenses so the real deployment risk stays unclear.

read the letter

The paper shows that standard prompting against a multimodal RAG pipeline can reveal whether a given image is in the retrieval index and can also surface its associated caption. That is the concrete result from their case study on vision-centric tasks. They treat this as evidence that mRAG systems carry privacy exposure when private image collections are indexed. The work is new in the sense that it applies membership inference and caption retrieval attacks specifically to the mRAG setting rather than to standalone retrieval or generation models. Releasing the evaluation code on GitHub is useful; anyone can rerun the attacks or adapt them to other pipelines. The abstract frames the contribution as an empirical warning rather than a theoretical claim, which keeps the scope honest. The main limitation is that the attacks are demonstrated only on vanilla prompting against default configurations. No results appear for even minimal mitigations such as output filtering, prompt sanitization, or access-controlled retrieval. Without those checks it is difficult to know whether the observed leakage would survive in any realistic deployment. The abstract itself contains no quantitative numbers or error analysis, so the scale of the effect is hard to judge from the summary alone. This paper is aimed at researchers who work on privacy for retrieval-augmented systems or multimodal models. A reader who needs a concrete example of how membership inference transfers to mRAG will find the attack surface laid out clearly. It deserves peer review because the topic is timely and the basic empirical method is reproducible, even though the current version would need additional experiments on defended setups to strengthen the practical takeaway.

Referee Report

1 major / 1 minor

Summary. The paper presents an empirical study of privacy risks in multimodal Retrieval-Augmented Generation (mRAG) systems. Through a case study it shows that standard prompting can be used to perform membership inference (determining whether a given visual asset belongs to the indexed collection) and to retrieve associated metadata such as image captions, thereby demonstrating leakage from private datasets incorporated into mRAG pipelines.

Significance. If the reported attacks remain effective across a wider range of mRAG configurations, the work would usefully draw attention to an under-studied privacy surface in vision-centric RAG deployments and supply a reproducible starting point (via the released code) for subsequent research on defenses.

major comments (1)

[Case Study] The evaluation is limited to vanilla prompting against an unspecified default retrieval-plus-generation stack. Because no results are reported under even minimal mitigations (access-controlled retrieval, prompt sanitization, or output filtering), it is unclear whether the observed leakage would persist in any realistic deployment; this directly affects the strength of the claim that mRAG systems leak private data in practice.

minor comments (1)

[Abstract] The abstract states that code is released, which aids reproducibility; however, the manuscript would benefit from an explicit description of the exact mRAG indexing and generation parameters used in the experiments.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We appreciate the opportunity to clarify the scope of our case study and address concerns about the evaluation's applicability to realistic deployments. Below we respond point by point to the major comment.

read point-by-point responses

Referee: [Case Study] The evaluation is limited to vanilla prompting against an unspecified default retrieval-plus-generation stack. Because no results are reported under even minimal mitigations (access-controlled retrieval, prompt sanitization, or output filtering), it is unclear whether the observed leakage would persist in any realistic deployment; this directly affects the strength of the claim that mRAG systems leak private data in practice.

Authors: We agree that the evaluation is confined to standard prompting on a baseline mRAG stack and that no empirical results are provided under mitigations. This was a deliberate design choice to isolate and demonstrate inherent leakage risks in typical, unprotected mRAG pipelines, as stated in the abstract and introduction. The manuscript does not claim that leakage occurs under all configurations or when defenses are applied. To strengthen the paper, we will make a partial revision by (1) explicitly specifying the retrieval and generation components used in the case study, (2) adding a new discussion subsection that qualitatively analyzes how the attacks could be impacted by common mitigations such as access controls, prompt sanitization, and output filtering (drawing on related privacy literature), and (3) updating the abstract, introduction, and conclusion to more precisely frame the findings as baseline risks that motivate the development of defenses. We believe this addresses the concern while preserving the paper's focus on the unprotected case, which remains a realistic and relevant scenario for many current deployments. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with no derivations

full rationale

The paper conducts an empirical case study implementing membership inference and image caption retrieval attacks on mRAG pipelines via standard prompting. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text or abstract. All claims rest on experimental observations rather than any reduction of outputs to inputs by construction, self-citation load-bearing premises, or ansatz smuggling. The analysis is therefore self-contained as a direct measurement study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This empirical paper introduces no new mathematical parameters, axioms, or entities; it applies existing attack techniques to mRAG systems.

pith-pipeline@v0.9.0 · 5452 in / 1056 out tokens · 41333 ms · 2026-05-16T11:47:19.257513+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we implement a case study that attempts to determine whether a visual asset (e.g., image) is included in the mRAG, and, if present, to leak the metadata (e.g., caption) related to it
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MIA and ICR attacks on image-centric mRAG under realistic visual transformations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.