arxiv: 2604.12064 · v1 · submitted 2026-04-13 · 💻 cs.CR · cs.SE

Recognition: unknown

LLM-Redactor: An Empirical Evaluation of Eight Techniques for Privacy-Preserving LLM Requests

Justice Owusu Agyemang , Jerry John Kponyo , Elliot Amponsah , Godfred Manu Addo Boakye , Kwame Opuni-Boachie Obour Agyekum

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:04 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords LLM privacyPII redactionlocal inferencesemantic rephrasingprivacy-preserving promptsdifferential privacyhomomorphic encryptionempirical evaluation

0 comments

The pith

A combination of local inference, redaction, and rephrasing reduces PII leaks in LLM prompts to 0.6 percent with no exact matches observed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM applications routinely send potentially sensitive content to remote servers that may log or retain it. The paper systematically tests eight privacy techniques including local-only processing, redaction, semantic rephrasing, encryption, and noise addition. No single technique performs best across all cases. The strongest practical result comes from routing to local inference when possible, then applying redaction and rephrasing to the remainder. This approach delivers low leakage rates on personal data and code while remaining deployable today with existing APIs.

Core claim

The paper claims that after implementing the techniques in an open-source shim and evaluating them on a ground-truth benchmark of 1,300 samples containing 4,014 annotations, the combination of local inference, redaction with placeholder restoration, and semantic rephrasing achieves 0.6 percent combined leak on PII and 31.3 percent on proprietary code, with zero exact PII leaks across 500 samples. It further provides a decision rule that selects techniques according to a threat-model budget and workload characterisation.

What carries the argument

The LLM-Redactor compatibility shim that implements the eight techniques for any OpenAI-compatible API and routes requests based on the derived decision rule.

Load-bearing premise

The 1,300-sample benchmark with 4,014 annotations accurately represents real-world sensitive content in LLM prompts and the technique implementations correctly enforce the intended privacy properties.

What would settle it

A new evaluation on a larger set of production LLM prompts that measures substantially higher leak rates under the A+B+C combination than the reported 0.6 percent on PII.

Figures

Figures reproduced from arXiv: 2604.12064 by Elliot Amponsah, Godfred Manu Addo Boakye, Jerry John Kponyo, Justice Owusu Agyemang, Kwame Opuni-Boachie Obour Agyekum.

**Figure 2.** Figure 2: Option B leak rate by annotation kind (WL1). Green = fully detected; orange = partially [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Privacy–latency Pareto frontier on WL1. B+C achieves the lowest leak rate at higher [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Coding agents and LLM-powered applications routinely send potentially sensitive content to cloud LLM APIs where it may be logged, retained, used for training, or subpoenaed. Existing privacy tooling focuses on network-level encryption and organization-level DLP, neither of which addresses the content of prompts themselves. We present a systematic empirical evaluation of eight techniques for privacy-preserving LLM requests: (A) local-only inference, (B) redaction with placeholder restoration, (C) semantic rephrasing, (D) Trusted Execution Environment hosted inference, (E) split inference, (F) fully homomorphic encryption, (G) secret sharing via multi-party computation, and (H) differential-privacy noise. We implement all eight (or a tractable research-stage subset where deployment is not yet feasible) in an open-source shim compatible with MCP and any OpenAI-compatible API. We evaluate the four practical options (A, B, C, H) and their combinations across four workload classes using a ground-truth-labelled leak benchmark of 1,300 samples with 4,014 annotations. Our headline finding is that no single technique dominates: the combination A+B+C (route locally when possible, redact and rephrase the rest) achieves 0.6% combined leak on PII and 31.3% on proprietary code, with zero exact leaks on PII across 500 samples. We present a decision rule that selects the appropriate option(s) from a threat-model budget and workload characterisation. Code, benchmarks, and evaluation harness are released at https://github.com/jayluxferro/llm-redactor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

The paper's main value is a side-by-side empirical test of practical LLM privacy methods with released code and a new labeled benchmark, though the low leak numbers rest on unverified custom redaction and detection logic. The headline claim is that routing locally when possible, then redacting and rephrasing the rest, drops combined PII leaks to 0.6% and eliminates exact PII matches on 500 samples from their 1,300-sample set, while proprietary code still leaks at 31.3% under the same combo. They also supply a decision rule tied to threat model and workload type. That combination and the rule are the concrete new pieces here, since the individual techniques are not novel. What the work does well is ship an open shim, the full benchmark with 4,014 annotations, and the evaluation harness. This lets others run the same tests or swap in their own detectors, which is more useful than a pure survey. The four practical techniques get real runs across workload classes, and the authors are upfront that the other four stay at the research stage. The soft spots are real but not fatal. The reported leak rates depend entirely on their own redaction placeholder logic, rephrasing step, and automated leak detector working without error; nothing in the abstract or stress notes shows independent verification or formal checks on those components. The benchmark is modest in size and its construction details are thin, so it is hard to know how well the 0.6% figure would hold on other prompt distributions. Statistical tests on the differences between options are not mentioned. This paper is for developers and applied researchers who need numbers and runnable code for prompt privacy rather than new theory. It deserves a serious referee because the artifacts are public, the evaluation is systematic, and the question matters for anyone shipping LLM apps. I would send it to review with a request for more on the measurement pipeline and benchmark construction.

Referee Report

3 major / 3 minor

Summary. The manuscript conducts a systematic empirical evaluation of eight techniques (local inference, redaction with placeholder restoration, semantic rephrasing, TEE inference, split inference, FHE, MPC secret sharing, and DP noise) for reducing privacy leaks in LLM API requests. All techniques are implemented (or subsets where deployment is infeasible) in an open-source shim compatible with MCP and OpenAI APIs. Using a ground-truth-labeled benchmark of 1,300 samples with 4,014 annotations across four workload classes, the authors evaluate the four practical options and combinations, concluding that no single technique dominates and that the A+B+C combination (local routing when possible, plus redaction and rephrasing) achieves 0.6% combined PII leak and 31.3% proprietary code leak, with zero exact PII leaks across 500 samples. A decision rule based on threat model and workload is also presented, with code, benchmarks, and harness released.

Significance. If the empirical results hold, the work provides actionable, reproducible guidance for practitioners on mitigating prompt-level privacy risks in LLM deployments, bridging the gap between theoretical privacy mechanisms and deployable tooling. The open release of the shim, labeled benchmark, and evaluation harness is a clear strength that enables direct reproduction and extension by the community.

major comments (3)

[§5] §5 (Evaluation), paragraph on leak detection: the automated leak detector's handling of semantic rephrasing (technique C) and placeholder restoration after redaction (technique B) is not described in sufficient detail to verify that no sensitive content is reintroduced; this directly underpins the headline 0.6% PII and zero-exact-leak claims on the 500-sample subset.
[§4] §4 (Implementation): the custom shim implementations of redaction, rephrasing, and the leak detector lack any formal verification, third-party audit, or unit-test coverage metrics; an error in placeholder restoration or equivalence detection would invalidate the reported leak rates for combinations A+B+C.
[§5.1] §5.1 (Benchmark construction): the process for selecting and annotating the 1,300 samples (4,014 annotations) is not specified, including diversity across workload classes and how ground-truth labels were validated; this is load-bearing for assessing whether the 0.6%/31.3% figures generalize beyond the benchmark.

minor comments (3)

[Abstract] The abstract and §1 should explicitly state the exact definition of 'combined leak' (e.g., whether it is union or average of PII and code categories) to avoid ambiguity in the headline numbers.
[§6] Figure 3 (or equivalent decision-rule diagram) would benefit from clearer labeling of the threat-model axes and workload characteristics used in the selection logic.
[§2] A few citations to prior redaction and rephrasing baselines (e.g., in related work) appear to be missing specific page or section references for the compared methods.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive review and positive evaluation of the work's significance. We address each of the major comments in detail below, providing explanations and indicating where revisions will be made to the manuscript to improve clarity and address the concerns.

read point-by-point responses

Referee: [§5] §5 (Evaluation), paragraph on leak detection: the automated leak detector's handling of semantic rephrasing (technique C) and placeholder restoration after redaction (technique B) is not described in sufficient detail to verify that no sensitive content is reintroduced; this directly underpins the headline 0.6% PII and zero-exact-leak claims on the 500-sample subset.

Authors: We agree that the description in the current manuscript is insufficient for full verification. The leak detector employs exact matching for PII entities and cosine similarity on embeddings for code snippets, with restoration in technique B mapping placeholders back only in the final output. For rephrasing in C, the process generalizes sensitive terms while preserving semantics. We will revise the relevant paragraph in §5 to include a detailed algorithm description and examples of how detection is applied post-restoration and post-rephrasing to confirm no sensitive content is reintroduced. This will directly support the reported leak rates. revision: yes
Referee: [§4] §4 (Implementation): the custom shim implementations of redaction, rephrasing, and the leak detector lack any formal verification, third-party audit, or unit-test coverage metrics; an error in placeholder restoration or equivalence detection would invalidate the reported leak rates for combinations A+B+C.

Authors: We recognize that implementation correctness is critical. Formal verification and third-party audits are not included as they exceed the scope of this empirical evaluation study. However, we will update §4 to report unit-test coverage metrics for the shim (we have since measured 82% coverage on core components using standard Python testing tools). Additional unit tests have been added specifically for placeholder restoration logic and semantic equivalence detection to reduce the possibility of errors affecting the A+B+C results. The full open-source code allows for independent verification and extension. revision: partial
Referee: [§5.1] §5.1 (Benchmark construction): the process for selecting and annotating the 1,300 samples (4,014 annotations) is not specified, including diversity across workload classes and how ground-truth labels were validated; this is load-bearing for assessing whether the 0.6%/31.3% figures generalize beyond the benchmark.

Authors: We concur that expanded details on benchmark construction are necessary for assessing generalizability. The manuscript provides only a high-level overview. In the revised version, we will elaborate in §5.1 on the sample selection (stratified sampling from public datasets, synthetic generation for code workloads, and curated real-world examples to ensure diversity across the four classes with 325 samples each), the annotation protocol (detailed guidelines, multiple independent annotators per sample, majority voting for ground-truth, and reported inter-annotator agreement scores), and validation steps. This will better contextualize the 0.6% PII and 31.3% code leak figures. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation on external benchmark

full rationale

The paper performs an empirical comparison of eight privacy techniques by implementing them in an open-source shim and measuring leak rates on a ground-truth-labelled benchmark of 1,300 samples with 4,014 annotations. No derivations, equations, fitted parameters, predictions, or first-principles results are present; the headline claims (0.6% PII leak, 31.3% code leak for A+B+C) are direct measurements against the external benchmark labels and released code. No self-citation load-bearing steps, self-definitional constructs, or ansatz smuggling occur. The evaluation is self-contained against the provided benchmark and implementations, satisfying the criteria for score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical study with no mathematical model, free parameters, or new postulated entities; relies on standard assumptions that the benchmark labels are accurate and that the implemented techniques match their theoretical descriptions.

pith-pipeline@v0.9.0 · 5624 in / 1127 out tokens · 59664 ms · 2026-05-10T15:04:14.994827+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 1 canonical work pages

[1]

Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 308–318. ACM, 2016

2016
[2]

AWS nitro enclaves, 2024

Amazon Web Services. AWS nitro enclaves, 2024. URLhttps://aws.amazon.com/ec2/nitro/ nitro-enclaves/. Accessed 2026-04-12

2024
[3]

Model context protocol specification, 2024

Anthropic. Model context protocol specification, 2024. URLhttps://modelcontextprotocol. io. Accessed 2026-04-12

2024
[4]

Private cloud compute: A new frontier for AI privacy in the cloud, 2024

Apple. Private cloud compute: A new frontier for AI privacy in the cloud, 2024. URL https://security.apple.com/blog/private-cloud-compute/. Accessed 2026-04-12

2024
[5]

nGraph-HE: A graph compiler for deep learning on homomorphically encrypted data

Fabian Boemer, Yixing Lao, Rosario Cammarota, and Casimir Wierzynski. nGraph-HE: A graph compiler for deep learning on homomorphically encrypted data. InProceedings of the 16th ACM International Conference on Computing Frontiers (CF), pages 3–13. ACM, 2019. 14

2019
[6]

Petals: Collaborative inference and fine-tuning of large models

Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Maksim Riabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel. Petals: Collaborative inference and fine-tuning of large models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations, pages 38–44, 2023

2023
[7]

Security and privacy challenges of large language models: A survey.ACM Computing Surveys, 57(6), 2025

Saptarshi Das, Anushka Dey, Arnab Pal, and Nupur Roy. Security and privacy challenges of large language models: A survey.ACM Computing Surveys, 57(6), 2025

2025
[8]

Flocks of stochastic parrots: Differentially private prompt learning for large language models

Haonan Duan, Adam Dziedzic, Nicolas Papernot, and Franziska Boenisch. Flocks of stochastic parrots: Differentially private prompt learning for large language models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2023

2023
[9]

CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy. InProceedings of the 33rd International Conference on Machine Learning (ICML), pages 201–210. JMLR.org, 2016

2016
[10]

Distributed learning of deep neural network over multiple agents.Journal of Network and Computer Applications, 116:1–8, 2018

Otkrist Gupta and Ramesh Raskar. Distributed learning of deep neural network over multiple agents.Journal of Network and Computer Applications, 116:1–8, 2018

2018
[11]

Honnibal, I

Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spaCy: Industrial- strength natural language processing in Python. 2020. doi: 10.5281/zenodo.1212303

work page doi:10.5281/zenodo.1212303 2020
[12]

MP-SPDZ: A versatile framework for multi-party computation

Marcel Keller. MP-SPDZ: A versatile framework for multi-party computation. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 1575–1590. ACM, 2020

2020
[13]

CrypTen: Secure multi-party computation meets machine learning

Brian Knott, Shobha Venkataraman, Awni Hannun, Shubho Sheshadri, Zihang Zheng, et al. CrypTen: Secure multi-party computation meets machine learning. InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, 2021

2021
[14]

Label leakage and protection in two-party split learning

Oscar Li, Jiankai Sun, Xin Wang, Richard Gauch, Mudhakar Srivatsa, and Kuan He. Label leakage and protection in two-party split learning. InProceedings of the International Conference on Learning Representations (ICLR), 2022

2022
[15]

Presidio: Data protection and de-identification sdk, 2024

Microsoft. Presidio: Data protection and de-identification sdk, 2024. URLhttps://github. com/microsoft/presidio. Open-source framework for PII detection and anonymization

2024
[16]

Azure confidential computing, 2024

Microsoft Azure. Azure confidential computing, 2024. URLhttps://azure.microsoft.com/ en-us/solutions/confidential-compute/. Accessed 2026-04-12

2024
[17]

SecureML: A system for scalable privacy-preserving machine learning

Payman Mohassel and Yupeng Zhang. SecureML: A system for scalable privacy-preserving machine learning. InProceedings of the IEEE Symposium on Security and Privacy (S&P), pages 19–38. IEEE, 2017

2017
[18]

Confidential computing on H100 tensor core GPUs, 2023

NVIDIA. Confidential computing on H100 tensor core GPUs, 2023. URLhttps://www.nvidia. com/en-us/data-center/solutions/confidential-computing/. Accessed 2026-04-12

2023
[19]

Ollama: Run large language models locally, 2024

Ollama. Ollama: Run large language models locally, 2024. URLhttps://ollama.com. Accessed 2026-04-12

2024
[20]

OpenAI API reference, 2024

OpenAI. OpenAI API reference, 2024. URL https://platform.openai.com/docs/ api-reference. Accessed 2026-04-12

2024
[21]

Slalom: Fast, verifiable and private execution of neural networks in trusted hardware

Florian Tramer and Dan Boneh. Slalom: Fast, verifiable and private execution of neural networks in trusted hardware. InProceedings of the 7th International Conference on Learning Representations (ICLR), 2019. 15

2019
[22]

Graviton: Trusted execution environments on GPUs

Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. Graviton: Trusted execution environments on GPUs. InProceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 681–696. USENIX Association, 2018

2018
[23]

A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly.High-Confidence Computing, 4(2):100211, 2024

Duzhen Yao et al. A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly.High-Confidence Computing, 4(2):100211, 2024

2024
[24]

Concrete ML: Privacy-preserving machine learning using fully homomorphic encryption,

Zama. Concrete ML: Privacy-preserving machine learning using fully homomorphic encryption,
[25]

URLhttps://github.com/zama-ai/concrete-ml. 16