pith. sign in

arxiv: 2603.09781 · v2 · pith:CBSO3EW3new · submitted 2026-03-10 · 💻 cs.CR

CLIOPATRA: Extracting Private Information from LLM Insights

classification 💻 cs.CR
keywords cliopatrainformationinsightsllm-basedprivacychatsextractprotections
0
0 comments X
read the original abstract

The widespread adoption of AI assistants has prompted the development of privacy-aware platforms designed to extract insights from real-world usage. Their privacy protections primarily rely on layering multiple heuristic techniques, such as PII redaction, clustering, aggregation, and LLM-based privacy auditing. In this paper, we put their privacy claims to the test by presenting CLIOPATRA, the first attack against ``privacy-preserving'' LLM-based insights systems. Our attack involves an adversary that carefully designs and inserts malicious chats into the system to break multiple layers of protections and induce the leakage of sensitive information from a target user's chat. We evaluate CLIOPATRA on one such platform, Anthropic's Clio, and target synthetically generated medical chats to show that an adversary can successfully and confidently (with nearly 100% precision) extract the medical history contained in these chats in up to 65% of cases. We also show that CLIOPATRA can stealthily extract information by obfuscating the private information in the generated insights. Finally, we demonstrate that existing ad hoc mitigations, such as LLM-based privacy auditing, are unreliable and fail to detect major leaks. Taken together, our findings indicate that, even when layered, current heuristic protections are insufficient to adequately protect user data, and that prompt injection has been an understudied risk in LLM-based insight systems.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Reconstruction of Personally Identifiable Information from Supervised Finetuned Models

    cs.CR 2026-05 unverdicted novelty 7.0

    PII can be reconstructed from SFT models via prefix attacks, with the new COVA algorithm improving success rates and leakage varying by attacker knowledge and PII type.