PII Shield: A Browser-Level Overlay for User-Controlled Personal Identifiable Information (PII) Management in AI Interactions

Max Holschneider; Saetbyeol LeeYouk

arxiv: 2603.24895 · v2 · submitted 2026-03-26 · 💻 cs.HC

PII Shield: A Browser-Level Overlay for User-Controlled Personal Identifiable Information (PII) Management in AI Interactions

Max Holschneider , Saetbyeol LeeYouk This is my paper

Pith reviewed 2026-05-15 01:16 UTC · model grok-4.3

classification 💻 cs.HC

keywords PII managementbrowser extensionAI privacydata anonymizationsmokescreen agentsuser-controlled dataLLM interactionsprivacy overlay

0 comments

The pith

A browser overlay lets users anonymize personal details in AI chats locally while sending autonomous fake activity to disrupt profiling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a browser-based system that applies enterprise-style redaction to everyday AI interactions by replacing sensitive personal information with anonymous stand-ins before any data leaves the user's device. It adds a second layer of protection through autonomous agents that generate unrelated activity patterns to interfere with third-party attempts to build user profiles from chat logs. A reader would care because growing numbers of people now share deeply personal information with cloud AI services without any practical way to limit what gets retained or reused. If the approach works, ordinary users could retain the benefits of AI coaching or therapy while regaining meaningful control over their data.

Core claim

The paper claims that a consumer-facing browser overlay can prevent data leakage and profiling during AI conversations by performing entity anonymization entirely on the user's machine and by deploying independent agent-driven smokescreen activities that mask real usage patterns, thereby extending enterprise privacy techniques to individual users in an accessible form.

What carries the argument

Local entity anonymization, which identifies and substitutes personal details in real time within the browser before transmission, combined with smokescreen agents that autonomously simulate unrelated queries and actions to interfere with profiling.

If this is right

Users can discuss personal topics with AI services without transmitting identifiable information.
Third-party profiling based on chat history becomes less reliable because of the added noise from smokescreen activities.
The same interface works across multiple web-based AI chat platforms without requiring changes to those services.
The open-source release allows others to inspect, modify, or extend the protection mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local-redaction pattern could be applied to other web forms and services that collect personal data.
Widespread adoption might reduce the volume of usable training data available to AI providers from chat logs.
Users could gain a clearer mental model of data flows once the anonymization step is visible in their browser.
Further work could explore ways to let users tune the strength of anonymization depending on the sensitivity of each conversation.

Load-bearing premise

Replacing real personal details with anonymized versions will still leave enough context for the AI to deliver useful responses rather than generic or broken ones.

What would settle it

Compare the accuracy and relevance of AI answers when the same personal queries are sent directly versus through the anonymized system, and test whether external attempts to link multiple sessions to a single user succeed or fail when smokescreen activity is present.

read the original abstract

AI chatbots have quietly become the world's most popular therapists, coaches, and confidants. Users of cloud-based LLM services are increasingly shifting from simple queries like idea generation and poem writing, to deeply personal interactions. As Large Language Models increasingly assume the role of our confessors, we are witnessing a massive, unregulated transfer of sensitive personal identifiable information (PII) to powerful tech companies with opaque privacy practices. While the enterprise sector has made great strides in addressing data leakage concerns through sophisticated guardrails and PII redaction pipelines, these powerful tools have functionally remained inaccessible for the average user due to their technical complexity. This results in a dangerous trade off for individual users. In order to receive the therapeutic or productivity benefits of AI, users need to abandon any agency they might otherwise have over their data, often without a clear mental model of what is being shared, and how it might be used for advertising later on. This work addresses this interaction gap, applying the redaction pipelines of enterprise-grade redaction into an intuitive, first-of-its-kind, consumer-facing, and free experience. Specifically, this work introduces a scalable, browser-based intervention designed to help align user behavior with their privacy preferences during web-based AI interactions. Our system introduces two key mechanisms: local entity anonymization to prevent data leakage, and 'smokescreens': autonomous agent activity to disrupt third-party profiling. An open-source implementation is accessible at the following GitHub Repository: https://github.com/SBleeyouk/PII_Shield.git

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PII Shield describes a browser overlay for local PII redaction plus smokescreen agents but supplies no tests on whether either step actually preserves utility or resists detection.

read the letter

The paper's core contribution is a practical browser extension that redacts personal details locally before they reach cloud LLMs and then runs fake agent sessions to blur profiling signals. That combination is new in a consumer setting, even if the individual pieces draw from enterprise redaction work. The open-source release at the linked repo is a clear plus; anyone can install it and see how the overlay sits on top of existing chat interfaces without changing the backend models. The architecture description is clear enough that a reader can follow the flow from user input through entity detection, replacement, and the separate smokescreen traffic generator. That part is useful for people who want to prototype similar client-side controls. The main weakness is the complete absence of any measurement. There are no similarity scores between original and redacted prompts, no task-success rates for the downstream LLM, and no logs or adversary model showing whether the smokescreen traffic looks different from real sessions. The claims that context survives redaction and that profiling is meaningfully disrupted rest only on the design choices. A reader who cares about real-world privacy impact therefore has to take those outcomes on faith. This work is aimed at tool builders and privacy-minded users who want something they can run today rather than at theorists or large-scale evaluators. It is coherent on its own terms and shows honest engagement with the gap between enterprise guardrails and consumer AI use. A serious editor should send it to review so the authors can add the missing quantitative checks; the idea is worth testing even if the current draft is only a starting point.

Referee Report

3 major / 2 minor

Summary. The paper presents PII Shield, a browser-level overlay system for user-controlled PII management during web-based AI chatbot interactions. It introduces two mechanisms: local entity anonymization to prevent data leakage to LLM providers while retaining sufficient context for useful responses, and 'smokescreens' consisting of autonomous agent-generated activity to disrupt third-party profiling. The work positions itself as a consumer-accessible, open-source implementation bridging enterprise redaction tools and individual users, with a GitHub repository provided.

Significance. If the mechanisms are shown to be effective, the system could meaningfully advance user agency over sensitive data in increasingly personal AI interactions, addressing an important gap between opaque cloud LLM practices and practical privacy controls. The open-source release is a positive step toward reproducibility and adoption in the HCI community.

major comments (3)

[Abstract] Abstract and overall system description: claims that local entity anonymization prevents leakage while preserving response utility, and that smokescreens meaningfully disrupt profiling, rest solely on design assertions with no supporting user studies, performance measurements, or effectiveness data.
[System Mechanisms] No quantitative evaluation is supplied for the anonymization mechanism, such as semantic similarity scores, task-success rates, or human-rated utility metrics comparing original versus anonymized prompts; this directly undermines the central assumption that context is sufficiently retained.
[Smokescreen Agents] The smokescreen component lacks any measurement of distinguishability (e.g., statistical tests on interaction logs or adversary simulation) or profiling disruption efficacy, leaving the second key claim unsupported by evidence.

minor comments (2)

[Discussion] The manuscript would benefit from explicit discussion of potential failure modes, such as anonymization errors that alter query intent or smokescreen patterns that could be filtered by sophisticated adversaries.
[Implementation] Installation and usage instructions for the open-source GitHub implementation could be expanded with screenshots or step-by-step examples to improve accessibility for HCI practitioners.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our manuscript on PII Shield. We address each major comment point by point below, clarifying the design rationale while acknowledging the need for stronger empirical support in certain areas.

read point-by-point responses

Referee: [Abstract] Abstract and overall system description: claims that local entity anonymization prevents leakage while preserving response utility, and that smokescreens meaningfully disrupt profiling, rest solely on design assertions with no supporting user studies, performance measurements, or effectiveness data.

Authors: The manuscript frames PII Shield primarily as a systems and HCI contribution describing a practical browser-based implementation. The claims regarding leakage prevention follow directly from the local-only processing architecture, which ensures no PII is transmitted to external services. Utility preservation is supported by the context-aware replacement strategy detailed in the system section, which substitutes entities while retaining syntactic and semantic structure. We agree that user studies and quantitative metrics would provide valuable additional validation. In the revised manuscript we will expand the Discussion section to include a proposed evaluation framework with example semantic similarity calculations and task-success scenarios. revision: partial
Referee: [System Mechanisms] No quantitative evaluation is supplied for the anonymization mechanism, such as semantic similarity scores, task-success rates, or human-rated utility metrics comparing original versus anonymized prompts; this directly undermines the central assumption that context is sufficiently retained.

Authors: We acknowledge that the current version does not include quantitative metrics such as embedding-based similarity scores or task-success rates. The paper instead emphasizes the engineering details of the local anonymization pipeline and its integration as a browser overlay. To address this gap, the revision will add a dedicated subsection presenting preliminary quantitative examples (e.g., cosine similarity on sentence embeddings for representative prompts) and will outline a concrete plan for human-rated utility assessment. These additions will be incorporated without altering the core system description. revision: yes
Referee: [Smokescreen Agents] The smokescreen component lacks any measurement of distinguishability (e.g., statistical tests on interaction logs or adversary simulation) or profiling disruption efficacy, leaving the second key claim unsupported by evidence.

Authors: The smokescreen mechanism is realized through autonomous agents that inject plausible synthetic activity to increase the entropy of observable interaction logs. The manuscript presents this as a design-level defense rather than an empirically validated one. We will revise the relevant section to include a qualitative characterization of the generated activity patterns and a discussion of potential future metrics, such as simulated adversary inference accuracy before and after smokescreen application. This will clarify the intended contribution while noting the absence of full-scale profiling simulations in the present work. revision: partial

Circularity Check

0 steps flagged

No circularity: direct system description without derivations or self-referential logic

full rationale

The paper presents an engineering system (browser overlay with local entity anonymization and smokescreen agents) as a direct architectural contribution. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text. Claims reduce to implementation details and open-source code rather than any self-definition or self-citation load-bearing step. This is a standard non-circular system paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The system rests on standard browser extension APIs for traffic interception and local processing; the smokescreen concept is introduced without independent prior validation.

axioms (1)

domain assumption Browser extension APIs permit reliable local interception and modification of web requests without server involvement.
Invoked implicitly when describing local entity anonymization.

invented entities (1)

smokescreens no independent evidence
purpose: Autonomous agent activity to disrupt third-party profiling of AI queries.
New term and mechanism introduced in the paper to address profiling concerns.

pith-pipeline@v0.9.0 · 5584 in / 1124 out tokens · 42104 ms · 2026-05-15T01:16:04.218203+00:00 · methodology

PII Shield: A Browser-Level Overlay for User-Controlled Personal Identifiable Information (PII) Management in AI Interactions

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)