pith. sign in

arxiv: 2601.17644 · v3 · submitted 2026-01-25 · 💻 cs.CR · cs.AI

Do Multimodal RAG Systems Leak Data? A Comprehensive Evaluation of Membership Inference and Image Caption Retrieval Attacks

Pith reviewed 2026-05-16 11:47 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords multimodal RAGprivacy risksmembership inferencedata leakageimage caption retrievalretrieval-augmented generationvision-language models
0
0 comments X

The pith

Multimodal RAG systems leak private image data and captions through standard prompting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs an empirical study of privacy risks in multimodal Retrieval-Augmented Generation pipelines used for vision tasks such as visual question answering. It implements membership inference attacks that check whether a specific image belongs to the private dataset and caption retrieval attacks that extract associated metadata when the image is present. A sympathetic reader would care because mRAG systems deliberately connect private visual datasets to improve performance, yet the same connection creates a direct path for data exposure. The study shows these leaks occur via ordinary model prompts without needing specialized attack techniques. The results point to the need for privacy safeguards in mRAG deployments.

Core claim

Through a case study on mRAG pipelines, the authors show that standard prompting suffices to determine whether a visual asset is included in the system and, when it is, to leak its related metadata such as the caption.

What carries the argument

Membership inference and image caption retrieval attacks executed via standard model prompting on the mRAG pipeline.

Load-bearing premise

Attacks carried out with ordinary prompting represent typical privacy threats in real mRAG systems that lack defenses or special configurations.

What would settle it

A controlled mRAG deployment with known private images where repeated standard prompts fail to identify membership or retrieve any correct caption metadata.

read the original abstract

The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g., visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets and improve model performance, it risks the leakage of private information from these datasets. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to determine whether a visual asset (e.g., image) is included in the mRAG, and, if present, to leak the metadata (e.g., caption) related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy. Our code is published online: https://github.com/aliwister/mrag-attack-eval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents an empirical study of privacy risks in multimodal Retrieval-Augmented Generation (mRAG) systems. Through a case study it shows that standard prompting can be used to perform membership inference (determining whether a given visual asset belongs to the indexed collection) and to retrieve associated metadata such as image captions, thereby demonstrating leakage from private datasets incorporated into mRAG pipelines.

Significance. If the reported attacks remain effective across a wider range of mRAG configurations, the work would usefully draw attention to an under-studied privacy surface in vision-centric RAG deployments and supply a reproducible starting point (via the released code) for subsequent research on defenses.

major comments (1)
  1. [Case Study] The evaluation is limited to vanilla prompting against an unspecified default retrieval-plus-generation stack. Because no results are reported under even minimal mitigations (access-controlled retrieval, prompt sanitization, or output filtering), it is unclear whether the observed leakage would persist in any realistic deployment; this directly affects the strength of the claim that mRAG systems leak private data in practice.
minor comments (1)
  1. [Abstract] The abstract states that code is released, which aids reproducibility; however, the manuscript would benefit from an explicit description of the exact mRAG indexing and generation parameters used in the experiments.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We appreciate the opportunity to clarify the scope of our case study and address concerns about the evaluation's applicability to realistic deployments. Below we respond point by point to the major comment.

read point-by-point responses
  1. Referee: [Case Study] The evaluation is limited to vanilla prompting against an unspecified default retrieval-plus-generation stack. Because no results are reported under even minimal mitigations (access-controlled retrieval, prompt sanitization, or output filtering), it is unclear whether the observed leakage would persist in any realistic deployment; this directly affects the strength of the claim that mRAG systems leak private data in practice.

    Authors: We agree that the evaluation is confined to standard prompting on a baseline mRAG stack and that no empirical results are provided under mitigations. This was a deliberate design choice to isolate and demonstrate inherent leakage risks in typical, unprotected mRAG pipelines, as stated in the abstract and introduction. The manuscript does not claim that leakage occurs under all configurations or when defenses are applied. To strengthen the paper, we will make a partial revision by (1) explicitly specifying the retrieval and generation components used in the case study, (2) adding a new discussion subsection that qualitatively analyzes how the attacks could be impacted by common mitigations such as access controls, prompt sanitization, and output filtering (drawing on related privacy literature), and (3) updating the abstract, introduction, and conclusion to more precisely frame the findings as baseline risks that motivate the development of defenses. We believe this addresses the concern while preserving the paper's focus on the unprotected case, which remains a realistic and relevant scenario for many current deployments. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with no derivations

full rationale

The paper conducts an empirical case study implementing membership inference and image caption retrieval attacks on mRAG pipelines via standard prompting. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text or abstract. All claims rest on experimental observations rather than any reduction of outputs to inputs by construction, self-citation load-bearing premises, or ansatz smuggling. The analysis is therefore self-contained as a direct measurement study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This empirical paper introduces no new mathematical parameters, axioms, or entities; it applies existing attack techniques to mRAG systems.

pith-pipeline@v0.9.0 · 5452 in / 1056 out tokens · 41333 ms · 2026-05-16T11:47:19.257513+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.