Recognition: unknown
LLM-Redactor: An Empirical Evaluation of Eight Techniques for Privacy-Preserving LLM Requests
Pith reviewed 2026-05-10 15:04 UTC · model grok-4.3
The pith
A combination of local inference, redaction, and rephrasing reduces PII leaks in LLM prompts to 0.6 percent with no exact matches observed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that after implementing the techniques in an open-source shim and evaluating them on a ground-truth benchmark of 1,300 samples containing 4,014 annotations, the combination of local inference, redaction with placeholder restoration, and semantic rephrasing achieves 0.6 percent combined leak on PII and 31.3 percent on proprietary code, with zero exact PII leaks across 500 samples. It further provides a decision rule that selects techniques according to a threat-model budget and workload characterisation.
What carries the argument
The LLM-Redactor compatibility shim that implements the eight techniques for any OpenAI-compatible API and routes requests based on the derived decision rule.
Load-bearing premise
The 1,300-sample benchmark with 4,014 annotations accurately represents real-world sensitive content in LLM prompts and the technique implementations correctly enforce the intended privacy properties.
What would settle it
A new evaluation on a larger set of production LLM prompts that measures substantially higher leak rates under the A+B+C combination than the reported 0.6 percent on PII.
Figures
read the original abstract
Coding agents and LLM-powered applications routinely send potentially sensitive content to cloud LLM APIs where it may be logged, retained, used for training, or subpoenaed. Existing privacy tooling focuses on network-level encryption and organization-level DLP, neither of which addresses the content of prompts themselves. We present a systematic empirical evaluation of eight techniques for privacy-preserving LLM requests: (A) local-only inference, (B) redaction with placeholder restoration, (C) semantic rephrasing, (D) Trusted Execution Environment hosted inference, (E) split inference, (F) fully homomorphic encryption, (G) secret sharing via multi-party computation, and (H) differential-privacy noise. We implement all eight (or a tractable research-stage subset where deployment is not yet feasible) in an open-source shim compatible with MCP and any OpenAI-compatible API. We evaluate the four practical options (A, B, C, H) and their combinations across four workload classes using a ground-truth-labelled leak benchmark of 1,300 samples with 4,014 annotations. Our headline finding is that no single technique dominates: the combination A+B+C (route locally when possible, redact and rephrase the rest) achieves 0.6% combined leak on PII and 31.3% on proprietary code, with zero exact leaks on PII across 500 samples. We present a decision rule that selects the appropriate option(s) from a threat-model budget and workload characterisation. Code, benchmarks, and evaluation harness are released at https://github.com/jayluxferro/llm-redactor.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript conducts a systematic empirical evaluation of eight techniques (local inference, redaction with placeholder restoration, semantic rephrasing, TEE inference, split inference, FHE, MPC secret sharing, and DP noise) for reducing privacy leaks in LLM API requests. All techniques are implemented (or subsets where deployment is infeasible) in an open-source shim compatible with MCP and OpenAI APIs. Using a ground-truth-labeled benchmark of 1,300 samples with 4,014 annotations across four workload classes, the authors evaluate the four practical options and combinations, concluding that no single technique dominates and that the A+B+C combination (local routing when possible, plus redaction and rephrasing) achieves 0.6% combined PII leak and 31.3% proprietary code leak, with zero exact PII leaks across 500 samples. A decision rule based on threat model and workload is also presented, with code, benchmarks, and harness released.
Significance. If the empirical results hold, the work provides actionable, reproducible guidance for practitioners on mitigating prompt-level privacy risks in LLM deployments, bridging the gap between theoretical privacy mechanisms and deployable tooling. The open release of the shim, labeled benchmark, and evaluation harness is a clear strength that enables direct reproduction and extension by the community.
major comments (3)
- [§5] §5 (Evaluation), paragraph on leak detection: the automated leak detector's handling of semantic rephrasing (technique C) and placeholder restoration after redaction (technique B) is not described in sufficient detail to verify that no sensitive content is reintroduced; this directly underpins the headline 0.6% PII and zero-exact-leak claims on the 500-sample subset.
- [§4] §4 (Implementation): the custom shim implementations of redaction, rephrasing, and the leak detector lack any formal verification, third-party audit, or unit-test coverage metrics; an error in placeholder restoration or equivalence detection would invalidate the reported leak rates for combinations A+B+C.
- [§5.1] §5.1 (Benchmark construction): the process for selecting and annotating the 1,300 samples (4,014 annotations) is not specified, including diversity across workload classes and how ground-truth labels were validated; this is load-bearing for assessing whether the 0.6%/31.3% figures generalize beyond the benchmark.
minor comments (3)
- [Abstract] The abstract and §1 should explicitly state the exact definition of 'combined leak' (e.g., whether it is union or average of PII and code categories) to avoid ambiguity in the headline numbers.
- [§6] Figure 3 (or equivalent decision-rule diagram) would benefit from clearer labeling of the threat-model axes and workload characteristics used in the selection logic.
- [§2] A few citations to prior redaction and rephrasing baselines (e.g., in related work) appear to be missing specific page or section references for the compared methods.
Simulated Author's Rebuttal
We thank the referee for their constructive review and positive evaluation of the work's significance. We address each of the major comments in detail below, providing explanations and indicating where revisions will be made to the manuscript to improve clarity and address the concerns.
read point-by-point responses
-
Referee: [§5] §5 (Evaluation), paragraph on leak detection: the automated leak detector's handling of semantic rephrasing (technique C) and placeholder restoration after redaction (technique B) is not described in sufficient detail to verify that no sensitive content is reintroduced; this directly underpins the headline 0.6% PII and zero-exact-leak claims on the 500-sample subset.
Authors: We agree that the description in the current manuscript is insufficient for full verification. The leak detector employs exact matching for PII entities and cosine similarity on embeddings for code snippets, with restoration in technique B mapping placeholders back only in the final output. For rephrasing in C, the process generalizes sensitive terms while preserving semantics. We will revise the relevant paragraph in §5 to include a detailed algorithm description and examples of how detection is applied post-restoration and post-rephrasing to confirm no sensitive content is reintroduced. This will directly support the reported leak rates. revision: yes
-
Referee: [§4] §4 (Implementation): the custom shim implementations of redaction, rephrasing, and the leak detector lack any formal verification, third-party audit, or unit-test coverage metrics; an error in placeholder restoration or equivalence detection would invalidate the reported leak rates for combinations A+B+C.
Authors: We recognize that implementation correctness is critical. Formal verification and third-party audits are not included as they exceed the scope of this empirical evaluation study. However, we will update §4 to report unit-test coverage metrics for the shim (we have since measured 82% coverage on core components using standard Python testing tools). Additional unit tests have been added specifically for placeholder restoration logic and semantic equivalence detection to reduce the possibility of errors affecting the A+B+C results. The full open-source code allows for independent verification and extension. revision: partial
-
Referee: [§5.1] §5.1 (Benchmark construction): the process for selecting and annotating the 1,300 samples (4,014 annotations) is not specified, including diversity across workload classes and how ground-truth labels were validated; this is load-bearing for assessing whether the 0.6%/31.3% figures generalize beyond the benchmark.
Authors: We concur that expanded details on benchmark construction are necessary for assessing generalizability. The manuscript provides only a high-level overview. In the revised version, we will elaborate in §5.1 on the sample selection (stratified sampling from public datasets, synthetic generation for code workloads, and curated real-world examples to ensure diversity across the four classes with 325 samples each), the annotation protocol (detailed guidelines, multiple independent annotators per sample, majority voting for ground-truth, and reported inter-annotator agreement scores), and validation steps. This will better contextualize the 0.6% PII and 31.3% code leak figures. revision: yes
Circularity Check
No circularity: purely empirical evaluation on external benchmark
full rationale
The paper performs an empirical comparison of eight privacy techniques by implementing them in an open-source shim and measuring leak rates on a ground-truth-labelled benchmark of 1,300 samples with 4,014 annotations. No derivations, equations, fitted parameters, predictions, or first-principles results are present; the headline claims (0.6% PII leak, 31.3% code leak for A+B+C) are direct measurements against the external benchmark labels and released code. No self-citation load-bearing steps, self-definitional constructs, or ansatz smuggling occur. The evaluation is self-contained against the provided benchmark and implementations, satisfying the criteria for score 0.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 308–318. ACM, 2016
2016
-
[2]
AWS nitro enclaves, 2024
Amazon Web Services. AWS nitro enclaves, 2024. URLhttps://aws.amazon.com/ec2/nitro/ nitro-enclaves/. Accessed 2026-04-12
2024
-
[3]
Model context protocol specification, 2024
Anthropic. Model context protocol specification, 2024. URLhttps://modelcontextprotocol. io. Accessed 2026-04-12
2024
-
[4]
Private cloud compute: A new frontier for AI privacy in the cloud, 2024
Apple. Private cloud compute: A new frontier for AI privacy in the cloud, 2024. URL https://security.apple.com/blog/private-cloud-compute/. Accessed 2026-04-12
2024
-
[5]
nGraph-HE: A graph compiler for deep learning on homomorphically encrypted data
Fabian Boemer, Yixing Lao, Rosario Cammarota, and Casimir Wierzynski. nGraph-HE: A graph compiler for deep learning on homomorphically encrypted data. InProceedings of the 16th ACM International Conference on Computing Frontiers (CF), pages 3–13. ACM, 2019. 14
2019
-
[6]
Petals: Collaborative inference and fine-tuning of large models
Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Maksim Riabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel. Petals: Collaborative inference and fine-tuning of large models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations, pages 38–44, 2023
2023
-
[7]
Security and privacy challenges of large language models: A survey.ACM Computing Surveys, 57(6), 2025
Saptarshi Das, Anushka Dey, Arnab Pal, and Nupur Roy. Security and privacy challenges of large language models: A survey.ACM Computing Surveys, 57(6), 2025
2025
-
[8]
Flocks of stochastic parrots: Differentially private prompt learning for large language models
Haonan Duan, Adam Dziedzic, Nicolas Papernot, and Franziska Boenisch. Flocks of stochastic parrots: Differentially private prompt learning for large language models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2023
2023
-
[9]
CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy
Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy. InProceedings of the 33rd International Conference on Machine Learning (ICML), pages 201–210. JMLR.org, 2016
2016
-
[10]
Distributed learning of deep neural network over multiple agents.Journal of Network and Computer Applications, 116:1–8, 2018
Otkrist Gupta and Ramesh Raskar. Distributed learning of deep neural network over multiple agents.Journal of Network and Computer Applications, 116:1–8, 2018
2018
-
[11]
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spaCy: Industrial- strength natural language processing in Python. 2020. doi: 10.5281/zenodo.1212303
-
[12]
MP-SPDZ: A versatile framework for multi-party computation
Marcel Keller. MP-SPDZ: A versatile framework for multi-party computation. InProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 1575–1590. ACM, 2020
2020
-
[13]
CrypTen: Secure multi-party computation meets machine learning
Brian Knott, Shobha Venkataraman, Awni Hannun, Shubho Sheshadri, Zihang Zheng, et al. CrypTen: Secure multi-party computation meets machine learning. InAdvances in Neural Information Processing Systems (NeurIPS), volume 34, 2021
2021
-
[14]
Label leakage and protection in two-party split learning
Oscar Li, Jiankai Sun, Xin Wang, Richard Gauch, Mudhakar Srivatsa, and Kuan He. Label leakage and protection in two-party split learning. InProceedings of the International Conference on Learning Representations (ICLR), 2022
2022
-
[15]
Presidio: Data protection and de-identification sdk, 2024
Microsoft. Presidio: Data protection and de-identification sdk, 2024. URLhttps://github. com/microsoft/presidio. Open-source framework for PII detection and anonymization
2024
-
[16]
Azure confidential computing, 2024
Microsoft Azure. Azure confidential computing, 2024. URLhttps://azure.microsoft.com/ en-us/solutions/confidential-compute/. Accessed 2026-04-12
2024
-
[17]
SecureML: A system for scalable privacy-preserving machine learning
Payman Mohassel and Yupeng Zhang. SecureML: A system for scalable privacy-preserving machine learning. InProceedings of the IEEE Symposium on Security and Privacy (S&P), pages 19–38. IEEE, 2017
2017
-
[18]
Confidential computing on H100 tensor core GPUs, 2023
NVIDIA. Confidential computing on H100 tensor core GPUs, 2023. URLhttps://www.nvidia. com/en-us/data-center/solutions/confidential-computing/. Accessed 2026-04-12
2023
-
[19]
Ollama: Run large language models locally, 2024
Ollama. Ollama: Run large language models locally, 2024. URLhttps://ollama.com. Accessed 2026-04-12
2024
-
[20]
OpenAI API reference, 2024
OpenAI. OpenAI API reference, 2024. URL https://platform.openai.com/docs/ api-reference. Accessed 2026-04-12
2024
-
[21]
Slalom: Fast, verifiable and private execution of neural networks in trusted hardware
Florian Tramer and Dan Boneh. Slalom: Fast, verifiable and private execution of neural networks in trusted hardware. InProceedings of the 7th International Conference on Learning Representations (ICLR), 2019. 15
2019
-
[22]
Graviton: Trusted execution environments on GPUs
Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. Graviton: Trusted execution environments on GPUs. InProceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 681–696. USENIX Association, 2018
2018
-
[23]
A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly.High-Confidence Computing, 4(2):100211, 2024
Duzhen Yao et al. A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly.High-Confidence Computing, 4(2):100211, 2024
2024
-
[24]
Concrete ML: Privacy-preserving machine learning using fully homomorphic encryption,
Zama. Concrete ML: Privacy-preserving machine learning using fully homomorphic encryption,
-
[25]
URLhttps://github.com/zama-ai/concrete-ml. 16
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.