pith. sign in

arxiv: 2603.24895 · v2 · submitted 2026-03-26 · 💻 cs.HC

PII Shield: A Browser-Level Overlay for User-Controlled Personal Identifiable Information (PII) Management in AI Interactions

Pith reviewed 2026-05-15 01:16 UTC · model grok-4.3

classification 💻 cs.HC
keywords PII managementbrowser extensionAI privacydata anonymizationsmokescreen agentsuser-controlled dataLLM interactionsprivacy overlay
0
0 comments X

The pith

A browser overlay lets users anonymize personal details in AI chats locally while sending autonomous fake activity to disrupt profiling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a browser-based system that applies enterprise-style redaction to everyday AI interactions by replacing sensitive personal information with anonymous stand-ins before any data leaves the user's device. It adds a second layer of protection through autonomous agents that generate unrelated activity patterns to interfere with third-party attempts to build user profiles from chat logs. A reader would care because growing numbers of people now share deeply personal information with cloud AI services without any practical way to limit what gets retained or reused. If the approach works, ordinary users could retain the benefits of AI coaching or therapy while regaining meaningful control over their data.

Core claim

The paper claims that a consumer-facing browser overlay can prevent data leakage and profiling during AI conversations by performing entity anonymization entirely on the user's machine and by deploying independent agent-driven smokescreen activities that mask real usage patterns, thereby extending enterprise privacy techniques to individual users in an accessible form.

What carries the argument

Local entity anonymization, which identifies and substitutes personal details in real time within the browser before transmission, combined with smokescreen agents that autonomously simulate unrelated queries and actions to interfere with profiling.

If this is right

  • Users can discuss personal topics with AI services without transmitting identifiable information.
  • Third-party profiling based on chat history becomes less reliable because of the added noise from smokescreen activities.
  • The same interface works across multiple web-based AI chat platforms without requiring changes to those services.
  • The open-source release allows others to inspect, modify, or extend the protection mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same local-redaction pattern could be applied to other web forms and services that collect personal data.
  • Widespread adoption might reduce the volume of usable training data available to AI providers from chat logs.
  • Users could gain a clearer mental model of data flows once the anonymization step is visible in their browser.
  • Further work could explore ways to let users tune the strength of anonymization depending on the sensitivity of each conversation.

Load-bearing premise

Replacing real personal details with anonymized versions will still leave enough context for the AI to deliver useful responses rather than generic or broken ones.

What would settle it

Compare the accuracy and relevance of AI answers when the same personal queries are sent directly versus through the anonymized system, and test whether external attempts to link multiple sessions to a single user succeed or fail when smokescreen activity is present.

read the original abstract

AI chatbots have quietly become the world's most popular therapists, coaches, and confidants. Users of cloud-based LLM services are increasingly shifting from simple queries like idea generation and poem writing, to deeply personal interactions. As Large Language Models increasingly assume the role of our confessors, we are witnessing a massive, unregulated transfer of sensitive personal identifiable information (PII) to powerful tech companies with opaque privacy practices. While the enterprise sector has made great strides in addressing data leakage concerns through sophisticated guardrails and PII redaction pipelines, these powerful tools have functionally remained inaccessible for the average user due to their technical complexity. This results in a dangerous trade off for individual users. In order to receive the therapeutic or productivity benefits of AI, users need to abandon any agency they might otherwise have over their data, often without a clear mental model of what is being shared, and how it might be used for advertising later on. This work addresses this interaction gap, applying the redaction pipelines of enterprise-grade redaction into an intuitive, first-of-its-kind, consumer-facing, and free experience. Specifically, this work introduces a scalable, browser-based intervention designed to help align user behavior with their privacy preferences during web-based AI interactions. Our system introduces two key mechanisms: local entity anonymization to prevent data leakage, and 'smokescreens': autonomous agent activity to disrupt third-party profiling. An open-source implementation is accessible at the following GitHub Repository: https://github.com/SBleeyouk/PII_Shield.git

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents PII Shield, a browser-level overlay system for user-controlled PII management during web-based AI chatbot interactions. It introduces two mechanisms: local entity anonymization to prevent data leakage to LLM providers while retaining sufficient context for useful responses, and 'smokescreens' consisting of autonomous agent-generated activity to disrupt third-party profiling. The work positions itself as a consumer-accessible, open-source implementation bridging enterprise redaction tools and individual users, with a GitHub repository provided.

Significance. If the mechanisms are shown to be effective, the system could meaningfully advance user agency over sensitive data in increasingly personal AI interactions, addressing an important gap between opaque cloud LLM practices and practical privacy controls. The open-source release is a positive step toward reproducibility and adoption in the HCI community.

major comments (3)
  1. [Abstract] Abstract and overall system description: claims that local entity anonymization prevents leakage while preserving response utility, and that smokescreens meaningfully disrupt profiling, rest solely on design assertions with no supporting user studies, performance measurements, or effectiveness data.
  2. [System Mechanisms] No quantitative evaluation is supplied for the anonymization mechanism, such as semantic similarity scores, task-success rates, or human-rated utility metrics comparing original versus anonymized prompts; this directly undermines the central assumption that context is sufficiently retained.
  3. [Smokescreen Agents] The smokescreen component lacks any measurement of distinguishability (e.g., statistical tests on interaction logs or adversary simulation) or profiling disruption efficacy, leaving the second key claim unsupported by evidence.
minor comments (2)
  1. [Discussion] The manuscript would benefit from explicit discussion of potential failure modes, such as anonymization errors that alter query intent or smokescreen patterns that could be filtered by sophisticated adversaries.
  2. [Implementation] Installation and usage instructions for the open-source GitHub implementation could be expanded with screenshots or step-by-step examples to improve accessibility for HCI practitioners.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our manuscript on PII Shield. We address each major comment point by point below, clarifying the design rationale while acknowledging the need for stronger empirical support in certain areas.

read point-by-point responses
  1. Referee: [Abstract] Abstract and overall system description: claims that local entity anonymization prevents leakage while preserving response utility, and that smokescreens meaningfully disrupt profiling, rest solely on design assertions with no supporting user studies, performance measurements, or effectiveness data.

    Authors: The manuscript frames PII Shield primarily as a systems and HCI contribution describing a practical browser-based implementation. The claims regarding leakage prevention follow directly from the local-only processing architecture, which ensures no PII is transmitted to external services. Utility preservation is supported by the context-aware replacement strategy detailed in the system section, which substitutes entities while retaining syntactic and semantic structure. We agree that user studies and quantitative metrics would provide valuable additional validation. In the revised manuscript we will expand the Discussion section to include a proposed evaluation framework with example semantic similarity calculations and task-success scenarios. revision: partial

  2. Referee: [System Mechanisms] No quantitative evaluation is supplied for the anonymization mechanism, such as semantic similarity scores, task-success rates, or human-rated utility metrics comparing original versus anonymized prompts; this directly undermines the central assumption that context is sufficiently retained.

    Authors: We acknowledge that the current version does not include quantitative metrics such as embedding-based similarity scores or task-success rates. The paper instead emphasizes the engineering details of the local anonymization pipeline and its integration as a browser overlay. To address this gap, the revision will add a dedicated subsection presenting preliminary quantitative examples (e.g., cosine similarity on sentence embeddings for representative prompts) and will outline a concrete plan for human-rated utility assessment. These additions will be incorporated without altering the core system description. revision: yes

  3. Referee: [Smokescreen Agents] The smokescreen component lacks any measurement of distinguishability (e.g., statistical tests on interaction logs or adversary simulation) or profiling disruption efficacy, leaving the second key claim unsupported by evidence.

    Authors: The smokescreen mechanism is realized through autonomous agents that inject plausible synthetic activity to increase the entropy of observable interaction logs. The manuscript presents this as a design-level defense rather than an empirically validated one. We will revise the relevant section to include a qualitative characterization of the generated activity patterns and a discussion of potential future metrics, such as simulated adversary inference accuracy before and after smokescreen application. This will clarify the intended contribution while noting the absence of full-scale profiling simulations in the present work. revision: partial

Circularity Check

0 steps flagged

No circularity: direct system description without derivations or self-referential logic

full rationale

The paper presents an engineering system (browser overlay with local entity anonymization and smokescreen agents) as a direct architectural contribution. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text. Claims reduce to implementation details and open-source code rather than any self-definition or self-citation load-bearing step. This is a standard non-circular system paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The system rests on standard browser extension APIs for traffic interception and local processing; the smokescreen concept is introduced without independent prior validation.

axioms (1)
  • domain assumption Browser extension APIs permit reliable local interception and modification of web requests without server involvement.
    Invoked implicitly when describing local entity anonymization.
invented entities (1)
  • smokescreens no independent evidence
    purpose: Autonomous agent activity to disrupt third-party profiling of AI queries.
    New term and mechanism introduced in the paper to address profiling concerns.

pith-pipeline@v0.9.0 · 5584 in / 1124 out tokens · 42104 ms · 2026-05-15T01:16:04.218203+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.