Do Multimodal RAG Systems Leak Data? A Comprehensive Evaluation of Membership Inference and Image Caption Retrieval Attacks
Pith reviewed 2026-05-16 11:47 UTC · model grok-4.3
The pith
Multimodal RAG systems leak private image data and captions through standard prompting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through a case study on mRAG pipelines, the authors show that standard prompting suffices to determine whether a visual asset is included in the system and, when it is, to leak its related metadata such as the caption.
What carries the argument
Membership inference and image caption retrieval attacks executed via standard model prompting on the mRAG pipeline.
Load-bearing premise
Attacks carried out with ordinary prompting represent typical privacy threats in real mRAG systems that lack defenses or special configurations.
What would settle it
A controlled mRAG deployment with known private images where repeated standard prompts fail to identify membership or retrieve any correct caption metadata.
read the original abstract
The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g., visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets and improve model performance, it risks the leakage of private information from these datasets. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to determine whether a visual asset (e.g., image) is included in the mRAG, and, if present, to leak the metadata (e.g., caption) related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy. Our code is published online: https://github.com/aliwister/mrag-attack-eval.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an empirical study of privacy risks in multimodal Retrieval-Augmented Generation (mRAG) systems. Through a case study it shows that standard prompting can be used to perform membership inference (determining whether a given visual asset belongs to the indexed collection) and to retrieve associated metadata such as image captions, thereby demonstrating leakage from private datasets incorporated into mRAG pipelines.
Significance. If the reported attacks remain effective across a wider range of mRAG configurations, the work would usefully draw attention to an under-studied privacy surface in vision-centric RAG deployments and supply a reproducible starting point (via the released code) for subsequent research on defenses.
major comments (1)
- [Case Study] The evaluation is limited to vanilla prompting against an unspecified default retrieval-plus-generation stack. Because no results are reported under even minimal mitigations (access-controlled retrieval, prompt sanitization, or output filtering), it is unclear whether the observed leakage would persist in any realistic deployment; this directly affects the strength of the claim that mRAG systems leak private data in practice.
minor comments (1)
- [Abstract] The abstract states that code is released, which aids reproducibility; however, the manuscript would benefit from an explicit description of the exact mRAG indexing and generation parameters used in the experiments.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We appreciate the opportunity to clarify the scope of our case study and address concerns about the evaluation's applicability to realistic deployments. Below we respond point by point to the major comment.
read point-by-point responses
-
Referee: [Case Study] The evaluation is limited to vanilla prompting against an unspecified default retrieval-plus-generation stack. Because no results are reported under even minimal mitigations (access-controlled retrieval, prompt sanitization, or output filtering), it is unclear whether the observed leakage would persist in any realistic deployment; this directly affects the strength of the claim that mRAG systems leak private data in practice.
Authors: We agree that the evaluation is confined to standard prompting on a baseline mRAG stack and that no empirical results are provided under mitigations. This was a deliberate design choice to isolate and demonstrate inherent leakage risks in typical, unprotected mRAG pipelines, as stated in the abstract and introduction. The manuscript does not claim that leakage occurs under all configurations or when defenses are applied. To strengthen the paper, we will make a partial revision by (1) explicitly specifying the retrieval and generation components used in the case study, (2) adding a new discussion subsection that qualitatively analyzes how the attacks could be impacted by common mitigations such as access controls, prompt sanitization, and output filtering (drawing on related privacy literature), and (3) updating the abstract, introduction, and conclusion to more precisely frame the findings as baseline risks that motivate the development of defenses. We believe this addresses the concern while preserving the paper's focus on the unprotected case, which remains a realistic and relevant scenario for many current deployments. revision: partial
Circularity Check
No circularity: purely empirical evaluation with no derivations
full rationale
The paper conducts an empirical case study implementing membership inference and image caption retrieval attacks on mRAG pipelines via standard prompting. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text or abstract. All claims rest on experimental observations rather than any reduction of outputs to inputs by construction, self-citation load-bearing premises, or ansatz smuggling. The analysis is therefore self-contained as a direct measurement study.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we implement a case study that attempts to determine whether a visual asset (e.g., image) is included in the mRAG, and, if present, to leak the metadata (e.g., caption) related to it
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MIA and ICR attacks on image-centric mRAG under realistic visual transformations
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.