pith. sign in

arxiv: 2606.29567 · v1 · pith:XS76N6QHnew · submitted 2026-06-28 · 💻 cs.CR · cs.CL

SurrogateShield: Beyond Redaction for High-Utility, Privacy-Preserving LLM Interactions

Pith reviewed 2026-06-30 06:55 UTC · model grok-4.3

classification 💻 cs.CR cs.CL
keywords PII protectionLLM privacysurrogate substitutionredaction alternativesclient-side proxysemantic utilityk-anonymity
0
0 comments X

The pith

SurrogateShield replaces PII in LLM queries with locally generated surrogates to preserve semantic utility while blocking real data transmission.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SurrogateShield as a client-side system that detects 22 PII types and quasi-identifiers via a three-stage cascade, substitutes them with type-consistent surrogates before sending the query, and restores originals in the response using a local encrypted map. This approach is shown to outperform placeholder redaction by 13.26 percentage points in BERTScore on a 1,124-query corpus while ensuring no real PII leaves the device. The work grounds its detection in k-anonymity principles and reports 98.87% F1 detection plus zero recovery in an adversarial trial.

Core claim

Surrogate substitution with local generation and restoration yields higher semantic fidelity than redaction for LLM interactions while restricting real PII transmission across query types, as measured by BERTScore gains and adversarial non-recovery.

What carries the argument

The surrogate substitution mechanism, which detects PII via PatternScan-EntityTrace-ContextGuard cascade, generates type-consistent local surrogates, and uses an AES-256-GCM encrypted ShadowMap for per-conversation original restoration.

If this is right

  • LLM queries containing PII can be processed by remote models without the original values ever leaving the local device.
  • Response quality remains closer to the unmodified query than with placeholder redaction.
  • The local pipeline blocks real PII transmission for all tested query types in the evaluated corpus.
  • A prompted LLM adversary recovers none of the original values from surrogate-substituted messages in 100 trials.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same substitution pattern could extend to other API endpoints that receive user text, such as search or analytics services.
  • Integration with on-device models might become less necessary if surrogate handling proves reliable at scale.
  • Expanding the cascade to additional quasi-identifiers would require re-testing the F1 and adversarial metrics.

Load-bearing premise

The detection cascade finds every instance of the 22 PII types and combinations without missing any that would let real data transmit.

What would settle it

A set of queries containing PII that the cascade misses, followed by successful recovery of the original values from the transmitted message or LLM response.

Figures

Figures reproduced from arXiv: 2606.29567 by Sherwin Vishesh Jathanna.

Figure 1
Figure 1. Figure 1: SurrogateShield end-to-end pipeline. The LLM API [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

LLM-based assistants transmit user queries verbatim to third-party API endpoints that lie outside the user's audit or control. When those queries contain personally identifiable information (PII), the data persists on remote infrastructure subject to breach, subpoena, or policy change. Placeholder redaction (the prevailing mitigation) suppresses PII at the cost of semantic coherence, producing structurally degraded queries and correspondingly degraded responses. We present SurrogateShield, a client-side proxy that substitutes detected PII with locally generated, type-consistent surrogate values prior to transmission and restores originals in the response. No real PII crosses the network boundary. Detection runs through a three-stage cascade (PatternScan, EntityTrace, and ContextGuard) covering 22 PII types and quasi-identifier combinations grounded in Sweeney's k-anonymity framework. Surrogate-to-original mappings are sealed in an AES-256-GCM encrypted per-conversation ShadowMap that never leaves the device. Evaluations on a 1,124-query corpus demonstrate that the cascade reliably detects PII, achieving an overall F1 score of 98.87%. Surrogate substitution substantially outperforms placeholder redaction in semantic utility, yielding a 13.26 pp improvement in BERTScore (roberta-large), from 81.59% to 94.85%. Within this corpus, the local pipeline restricted real PII transmission across all tested query types; in a 100-query adversarial trial, a prompted LLM adversary recovered no original values from surrogate-substituted messages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents SurrogateShield, a client-side proxy that detects PII via a three-stage cascade (PatternScan, EntityTrace, ContextGuard) covering 22 types grounded in k-anonymity, substitutes detected entities with locally generated surrogates before transmission to LLM APIs, and restores originals locally using an encrypted ShadowMap. On a 1,124-query corpus it reports 98.87% F1 for detection, a 13.26 pp BERTScore (roberta-large) gain over placeholder redaction (81.59% to 94.85%), restriction of real PII transmission on all tested queries, and zero recovery by a prompted LLM adversary in a 100-query trial.

Significance. If the detection cascade can be shown to achieve zero false negatives on the claimed PII types and quasi-identifiers, the work supplies a concrete, deployable alternative to redaction that materially improves response quality while keeping real PII off the network; the empirical comparison and adversarial trial are concrete strengths.

major comments (2)
  1. [Abstract] Abstract: the central privacy claim ('no real PII crosses the network boundary' and 'restricted real PII transmission across all tested query types') rests on the three-stage cascade having zero false negatives for all 22 PII types and quasi-identifier combinations. Only an aggregate F1 of 98.87% is supplied; neither per-type false-negative rates nor any evaluation on out-of-distribution or unseen phrasings are reported.
  2. [Abstract] Abstract: the reported utility metrics (13.26 pp BERTScore gain, 98.87% F1) are presented without error bars, dataset composition breakdown by PII type or query category, or full per-query results, preventing assessment of whether the gains and the 'across all tested query types' statement are robust.
minor comments (2)
  1. The manuscript would benefit from an explicit description of how the ShadowMap is initialized, keyed, and garbage-collected per conversation.
  2. The 100-query adversarial trial description should state the exact prompt template and model used for the adversary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, indicating planned revisions to improve transparency and robustness of the reported results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central privacy claim ('no real PII crosses the network boundary' and 'restricted real PII transmission across all tested query types') rests on the three-stage cascade having zero false negatives for all 22 PII types and quasi-identifier combinations. Only an aggregate F1 of 98.87% is supplied; neither per-type false-negative rates nor any evaluation on out-of-distribution or unseen phrasings are reported.

    Authors: We agree that per-type false-negative rates would strengthen the privacy claim. The 1,124-query corpus was designed to cover all 22 PII types and quasi-identifier combinations, and the statement that real PII transmission was restricted on all tested queries indicates zero observed false negatives within this specific corpus. We did not report per-type breakdowns in the original submission. In the revised manuscript we will add a table with detection performance (precision, recall, F1) per PII type. Evaluation on out-of-distribution or unseen phrasings was not performed; we will explicitly note this as a limitation and qualify the generalizability claim accordingly. revision: partial

  2. Referee: [Abstract] Abstract: the reported utility metrics (13.26 pp BERTScore gain, 98.87% F1) are presented without error bars, dataset composition breakdown by PII type or query category, or full per-query results, preventing assessment of whether the gains and the 'across all tested query types' statement are robust.

    Authors: We acknowledge that error bars, dataset breakdowns, and per-query results would improve assessment of robustness. The reported figures are aggregates over the full 1,124-query corpus. In the revised version we will include (i) standard deviation error bars for the BERTScore and F1 metrics, (ii) a breakdown of corpus composition by PII type and query category, and (iii) a link to the full per-query results as supplementary material. These additions will allow readers to evaluate the consistency of the gains across query types. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical system description only

full rationale

The paper describes a client-side PII detection and surrogate substitution pipeline with reported F1 scores and utility metrics on a fixed 1,124-query corpus. No equations, fitted parameters, predictions, or derivations appear. The k-anonymity reference is to external prior work (Sweeney). Central privacy claim rests on measured cascade behavior within the tested set rather than any self-referential construction or self-citation chain. This is the expected non-finding for an applied systems paper without mathematical modeling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The work is an engineering system description rather than a theoretical derivation, so the ledger contains only standard cryptographic assumptions and the system components themselves.

axioms (1)
  • standard math AES-256-GCM encryption provides confidentiality for the per-conversation ShadowMap stored on the device.
    Standard assumption on symmetric encryption primitives.
invented entities (1)
  • ShadowMap no independent evidence
    purpose: Encrypted local store of surrogate-to-original mappings that never leaves the device.
    New data structure introduced by the system.

pith-pipeline@v0.9.1-grok · 5797 in / 1422 out tokens · 43056 ms · 2026-06-30T06:55:28.051708+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 7 canonical work pages · 2 internal anchors

  1. [1]

    Aggarwal and Philip S

    Charu C. Aggarwal and Philip S. Yu. 2008.Privacy-Preserving Data Mining: Models and Algorithms. Springer

  2. [2]

    Rohan Anil, Badih Ghazi, Vineet Gupta, Ravi Kumar, and Pasin Manurangsi

  3. [3]

    InProceedings of EMNLP 2022

    Large-scale differentially private BERT. InProceedings of EMNLP 2022. 6481–6491

  4. [4]

    Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert- Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. 2021. Extracting training data from large language models. InProceedings of the 30th USENIX Security Symposium. 2633–2650

  5. [5]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019. 4171–4186

  6. [6]

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali- brating noise to sensitivity in private data analysis. InTheory of Cryptography Conference. Springer, 265–284

  7. [7]

    Oluwaseun Feyisetan, Borja Balle, Thomas Drake, and Tom Diethe. 2020. Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations. InProceedings of WSDM 2020. 178–186. doi:10.1145/3336191.3371856

  8. [8]

    Benjamin C. M. Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy- preserving data publishing: A survey of recent developments.Comput. Surveys 42, 4 (2010), 1–53

  9. [9]

    2010.HMAC-based Extract-and-Expand Key Derivation Function (HKDF)

    Hugo Krawczyk and Pasi Eronen. 2010.HMAC-based Extract-and-Expand Key Derivation Function (HKDF). Technical Report RFC 5869. IETF

  10. [10]

    John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. InProceedings of ICML 2001. 282–289. doi:10.5555/645530.655813

  11. [11]

    Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of NAACL-HLT 2016. 260–270

  12. [12]

    Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. InProceedings of ICDE 2007. 106–115. doi:10.1109/ICDE.2007.367856

  13. [13]

    Pierre Lison, Ildikó Pilán, David Sánchez, Montserrat Batet, and Lilja Øvrelid

  14. [14]

    Anonymisation models for text data: State of the art, challenges and future directions. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4188–4203. doi:10.18653/v1/2021.acl- long.323

  15. [15]

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach.arXiv preprint arXiv:1907.11692 (2019)

  16. [16]

    Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi- directional LSTM-CNNs-CRF. InProceedings of ACL 2016. 1064–1074

  17. [17]

    Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrish- nan Venkitasubramaniam. 2007. l-diversity: Privacy beyond k-anonymity.ACM Transactions on Knowledge Discovery from Data1, 1 (2007), 3

  18. [18]

    Microsoft. 2020. Presidio — data protection and anonymization API. Microsoft Open Source. https://github.com/microsoft/presidio

  19. [19]

    Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, and Yejin Choi. 2024. Can LLMs keep a secret? Testing privacy implications of language models via contextual integrity theory. InInternational Conference on Learning Representations (ICLR 2024). https://arxiv.org/abs/2310. 17884

  20. [20]

    Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. InProceedings of the 2008 IEEE Symposium on Security and Privacy. 111–125

  21. [21]

    Douglass, Li-wei H

    Ishna Neamatullah, Margaret M. Douglass, Li-wei H. Lehman, Andrew Reisner, Mauricio Villarroel, William J. Long, Peter Szolovits, George B. Moody, Roger G. Mark, and Gari D. Clifford. 2008. Automated de-identification of free-text medical records.BMC Medical Informatics and Decision Making8, 1 (2008), 1–17. doi:10. 1186/1472-6947-8-32

  22. [22]

    Helen Nissenbaum. 2004. Privacy as contextual integrity.Washington Law Review 79, 1 (2004), 119–158

  23. [23]

    Chen Qu, Weize Kong, Liu Yang, Mingyang Zhang, Michael Bendersky, and Marc Najork. 2021. Natural language understanding with privacy-preserving BERT. InProceedings of the 30th ACM International Conference on Information and Knowledge Management. 1488–1497. doi:10.1145/3459637.3482281

  24. [24]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. InProceedings of EMNLP-IJCNLP 2019. 3982–3992

  25. [25]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. Dis- tilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108(2019)

  26. [26]

    Amber Stubbs and Özlem Uzuner. 2015. Annotating longitudinal clinical narra- tives for de-identification: The 2014 i2b2/UTHealth corpus.Journal of Biomedical Informatics58 (2015), S20–S29

  27. [27]

    2000.Simple demographics often identify people uniquely

    Latanya Sweeney. 2000.Simple demographics often identify people uniquely. Technical Report Data Privacy Working Paper 3. Carnegie Mellon University, Pittsburgh, PA

  28. [28]

    Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy.Inter- national Journal of Uncertainty, Fuzziness and Knowledge-Based Systems10, 05 (2002), 557–570

  29. [29]

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-t...

  30. [30]

    Weinberger, and Yoav Artzi

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi

  31. [31]

    Building

    BERTScore: Evaluating text generation with BERT. InProceedings of ICLR 2020. A Surrogate Generation Mechanisms Table 10 documents the surrogate generator used by MimicGen for each PII type detected by SurrogateShield, together with the format guarantee and a representative example value. All surrogates are generated via Faker with a per-session unique- ne...