pith. sign in

arxiv: 2606.24623 · v1 · pith:OBOD2EMHnew · submitted 2026-06-23 · 💻 cs.CL · cs.AI

Privacy-Preserving RAG via Multi-Agent Semantic Rewriting: Achieving Confidentiality Without Compromising Contextual Fidelity

Pith reviewed 2026-06-25 23:43 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords privacy-preserving RAGmulti-agent semantic rewritingprivacy leakage reductioncontextual fidelitysemantic sanitizationretrieval-augmented generationconfidentiality in LLMs
0
0 comments X

The pith

A three-agent system rewrites retrieved text to strip out private identifiers while keeping the meaning that downstream models need for accurate responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a multi-agent framework to sanitize content retrieved for generation tasks so that private information does not leak through malicious prompts. Three agents handle privacy extraction, semantic analysis, and reconstruction in sequence to remove identifiers without altering core meaning. Tests on medical and general datasets with several large models show large drops in exposed private facts under attack while fidelity scores stay at or above those of earlier methods. The rewriting step runs once offline, so live queries incur no extra delay. This matters because retrieval-augmented systems are increasingly used in domains where both accuracy and confidentiality are required.

Core claim

By deploying three specialized agents—one to extract privacy risks, one to analyze semantics, and one to reconstruct the text—the framework removes sensitive identifiers from retrieved passages while preserving their semantic core. Evaluation across the ChatDoctor and Wiki-PII datasets and six language models demonstrates that targeted information exposure falls dramatically, for example from 144 instances to 1 in LLaMA-3-8B, and that a BLEU-1 fidelity score of 0.122 exceeds the 0.117 achieved by the SAGE baseline. The entire process executes as offline preprocessing and therefore adds no latency to online inference.

What carries the argument

A collaborative three-agent pipeline that performs privacy extraction, semantic analysis, and text reconstruction to sanitize retrieved content.

If this is right

  • Targeted privacy exposure decreases substantially across multiple models and datasets.
  • Semantic fidelity measured by BLEU-1 matches or exceeds that of existing sanitization approaches.
  • The preprocessing nature ensures zero added latency during model inference.
  • The method applies to both domain-specific and general knowledge datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent division could be inserted into other retrieval pipelines that face similar leakage risks.
  • If the agents are updated with stronger models, the reduction in exposure might extend to previously unseen identifier patterns.
  • Because rewriting occurs once, the approach can be layered onto existing RAG systems without altering their query-time architecture.

Load-bearing premise

The three agents can reliably detect all sensitive identifiers and reconstruct text that retains the exact semantic content required for correct downstream RAG performance across varied domains and attack types.

What would settle it

An experiment showing that rewritten documents still permit recovery of most original private identifiers under a targeted attack would demonstrate that the sanitization step fails to achieve the claimed confidentiality.

Figures

Figures reproduced from arXiv: 2606.24623 by Derek F. Wong, Huafei Xing, Jianbin Li, Tao Fang, Tianyu Zhang, Yuanhe Zhao.

Figure 1
Figure 1. Figure 1: Comparison of semantic rewriting strategies. The degree to which a strategy preserves the original semantics significantly impacts the rewritten content, which in turn affects the performance and reliability of downstream tasks. To mitigate these confidentiality threats, prior work has proposed replacing original retrieval content with synthetic data at the data source level Zeng, Zhang, He, Ren, Zheng, Lu… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed multi-agent semantic rewriting framework for privacy-preserving RAG. The offline phase sanitizes raw documents through three collaborative agents (Pri-Extra agent, Sem-Extra agent, and Reconstruction agent), while the online phase retrieves over the original corpus but supplies only rewritten text to the LLM. 3.2. Offline Multi-Agent Rewriting Pipeline The offline pipeline acts as … view at source ↗
read the original abstract

Retrieval-Augmented Generation enhances large language models by incorporating external knowledge, but deploying it in sensitive scenarios risks privacy leakage via malicious prompts. To address this, we propose a multi-agent framework that sanitizes retrieved content through semantic rewriting. By employing three specialized agents for privacy extraction, semantic analysis, and reconstruction, our approach collaboratively removes sensitive identifiers while preserving the semantic core. We evaluate the framework on the ChatDoctor and Wiki-PII datasets across six large language models. Experimental results demonstrate a significant reduction in privacy leakage under targeted attacks. For instance, we reduced targeted information exposure in LLaMA-3-8B from 144 instances in the baseline to just 1. Furthermore, we maintain strong contextual fidelity with a BLEU-1 score of 0.122, outperforming the existing SAGE method's 0.117. Finally, the framework operates as an asynchronous preprocessing module, introducing no additional latency to online inference, as all rewriting is executed as a one-time offline preprocessing step. To promote reproducibility, the source code of this work is publicly available at https://github.com/foursoils/Privacy-Preserving-RAG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a multi-agent framework for privacy-preserving RAG consisting of a privacy extraction agent, semantic analysis agent, and reconstruction agent that collaboratively sanitize retrieved contexts by removing sensitive identifiers while aiming to preserve semantic content. It evaluates the approach on the ChatDoctor and Wiki-PII datasets across six LLMs, reporting a reduction in targeted information exposure (e.g., from 144 to 1 instances on LLaMA-3-8B) and a BLEU-1 score of 0.122 (vs. SAGE's 0.117), with all rewriting performed as offline preprocessing to avoid inference latency. The source code is released publicly.

Significance. If the central claims hold, the work would offer a practical, low-latency method for reducing privacy leakage in RAG deployments on sensitive data without requiring changes to the underlying LLM or retriever. The public code release is a positive factor for reproducibility. However, the significance is constrained by the absence of direct evidence that the rewritten contexts support correct downstream RAG behavior.

major comments (2)
  1. [Abstract / Evaluation] Abstract and evaluation sections: The claim that the method maintains 'contextual fidelity' required for RAG is not supported by any downstream task metrics. Only BLEU-1 (0.122) is reported as a proxy; no accuracy, F1, or exact-match scores are provided for question-answering performance on held-out queries when using the sanitized contexts versus the original contexts. This directly undermines the 'without compromising contextual fidelity' part of the title and abstract claim.
  2. [Abstract] Abstract: The reported reduction in 'targeted information exposure' (144 to 1 on LLaMA-3-8B) lacks accompanying details on attack construction, the precise definition and counting of exposure instances, dataset construction, or statistical tests. Without these, it is not possible to assess whether the privacy improvement is robust or generalizes beyond the specific attack and datasets used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of our evaluation and presentation. We address each major comment below and commit to revisions that strengthen the manuscript's claims and clarity.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and evaluation sections: The claim that the method maintains 'contextual fidelity' required for RAG is not supported by any downstream task metrics. Only BLEU-1 (0.122) is reported as a proxy; no accuracy, F1, or exact-match scores are provided for question-answering performance on held-out queries when using the sanitized contexts versus the original contexts. This directly undermines the 'without compromising contextual fidelity' part of the title and abstract claim.

    Authors: We agree that downstream RAG task metrics (e.g., QA accuracy or F1 on held-out queries) would provide more direct evidence that semantic rewriting preserves utility for retrieval-augmented generation. BLEU-1 was selected as a standard n-gram overlap proxy for contextual similarity between original and rewritten contexts, and it shows a modest improvement over SAGE. However, this is a valid limitation of the current evaluation. In the revised manuscript we will add end-to-end experiments measuring question-answering performance when the sanitized contexts are used in place of the originals across the evaluated models and datasets. revision: yes

  2. Referee: [Abstract] Abstract: The reported reduction in 'targeted information exposure' (144 to 1 on LLaMA-3-8B) lacks accompanying details on attack construction, the precise definition and counting of exposure instances, dataset construction, or statistical tests. Without these, it is not possible to assess whether the privacy improvement is robust or generalizes beyond the specific attack and datasets used.

    Authors: The full paper provides the attack methodology, exposure definition, and dataset details in Sections 3 and 4, but we acknowledge that the abstract and high-level presentation do not sufficiently summarize these elements for readers. We will revise the abstract to include a concise description of the targeted attack, the exact counting procedure for exposure instances, and note the datasets used. We will also add a short subsection or appendix entry on statistical considerations if applicable. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with no derivations or self-referential definitions

full rationale

The paper describes a multi-agent semantic rewriting method and reports direct experimental measurements (targeted information exposure counts dropping from 144 to 1, BLEU-1 scores of 0.122 vs 0.117) on fixed datasets (ChatDoctor, Wiki-PII) and models (LLaMA-3-8B etc.). No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text; results are not reduced to quantities defined by the authors' own inputs or self-citations. The central claims rest on observable outputs rather than tautological redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The framework depends on the empirical performance of LLM-based agents for identification and rewriting; no free parameters are introduced in the abstract, and the three agents are new components whose effectiveness is asserted via experiment rather than independent evidence.

axioms (1)
  • domain assumption Large language models can be prompted to perform reliable privacy extraction, semantic analysis, and text reconstruction when used as specialized agents.
    The entire pipeline rests on the assumption that the underlying LLMs possess sufficient capability for these subtasks without additional training or verification.
invented entities (3)
  • Privacy extraction agent no independent evidence
    purpose: Identifies sensitive identifiers in retrieved content
    New role introduced by the framework; no independent evidence provided beyond the reported experiments.
  • Semantic analysis agent no independent evidence
    purpose: Analyzes the semantic core of the content
    New role introduced by the framework; no independent evidence provided beyond the reported experiments.
  • Reconstruction agent no independent evidence
    purpose: Rebuilds sanitized content while preserving meaning
    New role introduced by the framework; no independent evidence provided beyond the reported experiments.

pith-pipeline@v0.9.1-grok · 5750 in / 1457 out tokens · 30955 ms · 2026-06-25T23:43:40.215903+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 5 canonical work pages

  1. [1]

    2318–2335

    M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, in: Findings of the association for computational linguistics: ACL 2024, pp. 2318–2335. Chowdhury, J.R., Caragea, C.,

  2. [2]

    arXiv preprint arXiv:2501.13122

    Zero-shot verification-guided chain of thoughts. arXiv preprint arXiv:2501.13122 . Cohen, S., Bitton, R., Nassi, B.,

  3. [3]

    arXiv preprint arXiv:2409.08045

    Unleashing worms and extracting data: Escalating the outcome of attacks against rag-based inference in scale and severity using jailbreaking. arXiv preprint arXiv:2409.08045 . De Jong, M., Zemlyanskiy, Y ., Ainslie, J., FitzGerald, N., Sanghai, S., Sha, F., Cohen, W.,

  4. [4]

    11534–11547

    Fido: Fusion-in-decoder optimized for stronger performance and faster inference, in: Findings of the Association for Computational Linguistics: ACL 2023, pp. 11534–11547. Di Maio, C., Cosci, C., Maggini, M., Poggioni, V ., Melacci, S.,

  5. [5]

    Pirates of the rag: Adaptively attacking llms to leak knowledge bases, in: Proceedings of the 28th European Conference on Artificial Intelligence, IOS Press. pp. 4041–4048. doi:10.3233/FAIA251293. Feng, Q., Sun, X., Shen, Y ., Li, J., 2026a. Identifying time-varying and country-specific drivers of sovereign debt risk from credit rating reports. Informatio...

  6. [6]

    arXiv preprint arXiv:2407.21059

    Modular rag: Transforming rag systems into lego-like reconfigurable frameworks. arXiv preprint arXiv:2407.21059 . He, J., Liu, C., Hou, G., Jiang, W., Li, J., 2025a. Press: Defending privacy in retrieval-augmented generation via embedding space shifting, in: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)...

  7. [7]

    14887–14902

    Privacy implications of retrieval-based language models, in: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 14887–14902. Karpukhin, V ., Oguz, B., Min, S., Lewis, P.S., Wu, L., Edunov, S., Chen, D., Yih, W.t.,

  8. [8]

    arXiv preprint arXiv:2412.04697

    Privacy-preserving retrieval-augmented generation with differential privacy. arXiv preprint arXiv:2412.04697 . Lee, S., Lee, D.G.,

  9. [9]

    Information Processing & Management 63, 104379

    Enriching object-aware image–text highlight information for visual question generation. Information Processing & Management 63, 104379. doi: https://doi.org/10.1016/j.ipm.2025.104379. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V ., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.,

  10. [10]

    arXiv preprint arXiv:2411.05034

    Mitigating privacy risks in llm embeddings from embedding inversion. arXiv preprint arXiv:2411.05034 . Martínez-Murillo, I., Maestre, M.M., Suárez, A., Lloret, E., Moreda, P.,

  11. [11]

    Information Processing & Management 63, 104486

    Assessing the potential of llms as crowdworkers for contextual information generation. Information Processing & Management 63, 104486. doi: https://doi.org/10.1016/j.ipm.2025.104486. Menick, J., Trebacz, M., Mikulik, V ., Aslanides, J., Song, F., Chadwick, M., Glaese, M., Young, S., Campbell-Gillingham, L., Irving, G., et al.,

  12. [12]

    arXiv preprint arXiv:2203.11147

    Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147 . Morris, J., Kuleshov, V ., Shmatikov, V ., Rush, A.M.,

  13. [13]

    12448–12460

    Text embeddings reveal (almost) as much as text, in: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 12448–12460. Opoku, D.O., Sheng, M., Zhang, Y .,

  14. [14]

    arXiv preprint arXiv:2505.17058

    Do-rag: A domain-specific qa framework using knowledge graph-enhanced retrieval-augmented generation. arXiv preprint arXiv:2505.17058 . Papineni, K., Roukos, S., Ward, T., Zhu, W.J.,

  15. [15]

    3715–3734

    Colbertv2: Effective and efficient retrieval via lightweight late interaction, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3715–3734. Shi, W., Min, S., Yasunaga, M., Seo, M., James, R., Lewis, M., Zettlemoyer, L., Yih, W.t.,

  16. [16]

    8371–8384

    Replug: Retrieval-augmented black-box language models, in: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers), pp. 8371–8384. Shuster, K., Poff, S., Chen, M., Kiela, D., Weston, J.,

  17. [17]

    3784–3803

    Retrieval augmentation reduces hallucination in conversation, in: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 3784–3803. Siriwardhana, S., Weerasekera, R., Wen, E., Kaluarachchi, T., Rana, R., Nanayakkara, S.,

  18. [18]

    5613–5626

    Rear: A relevance-aware retrieval-augmented framework for open-domain question answering, in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 5613–5626. Wu, Z., Arora, A., Wang, Z., Geiger, A., Jurafsky, D., Manning, C.D., Potts, C.,

  19. [19]

    Jupyter widgets and extensions for education and research in computational physics and chemistry

    Mspf: A multi-semantic prompting fusion framework for emotion-cause pair extraction in conversations. Information Processing & Management 63, 104356. doi: https://doi.org/10.1016/j. ipm.2025.104356. Xu, G., Feng, J., Wang, Q., 2026a. Learning rules and aligning elements for document-level relation extraction. Information Processing & Management 63, 104511...

  20. [20]

    arXiv preprint arXiv:2503.13514

    Rag-kg-il: A multi-agent hybrid framework for reducing hallucinations and enhancing llm reasoning through rag and incremental knowledge graph learning integration. arXiv preprint arXiv:2503.13514 . Yu, Y ., Zhuang, Y ., Zhang, J., Meng, Y ., Ratner, A.J., Krishna, R., Shen, J., Zhang, C.,

  21. [21]

    (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc

    Large language model as attributed training data generator: A tale of diversity and bias, in: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 55734–55784. URL: https://proceedings.neurips.cc/paper_ files/paper/2023/file/ae9500c4f5607caf2eff033c67d...

  22. [22]

    4505–4524

    The good and the bad: Exploring privacy issues in retrieval-augmented generation (rag), in: Findings of the Association for Computational Linguistics: ACL 2024, pp. 4505–4524. Zeng, S., Zhang, J., He, P., Ren, J., Zheng, T., Lu, H., Xu, H., Liu, H., Xing, Y ., Tang, J.,

  23. [23]

    24538–24569

    Mitigating the privacy issues in retrieval-augmented generation (rag) via pure synthetic data, in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 24538–24569. Zhang, Y ., Wu, J., Li, R., Zhang, T., Song, Y ., Li, C., Wang, S., Shen, H., Yin, J., Ge, J., Luo, B.,

  24. [24]

    Information Processing & Management 63, 104505

    Privacy protection in rag: A novel method and evaluation framework. Information Processing & Management 63, 104505. doi: https://doi.org/10.1016/j.ipm.2025.104505. Zhao, X., Liu, S., Yang, S.Y ., Miao, C.,

  25. [25]

    4442–4457

    Medrag: Enhancing retrieval-augmented generation with knowledge graph-elicited reasoning for healthcare copilot, in: Proceedings of the ACM on Web Conference 2025, pp. 4442–4457. Zhou, P., Feng, Y ., Yang, Z.,

  26. [26]

    arXiv preprint arXiv:2503.15548

    Privacy-aware rag: Secure and isolated knowledge retrieval. arXiv preprint arXiv:2503.15548 . Zou, W., Geng, R., Wang, B., Jia, J.,