Inferential Privacy Leakage in Anonymized Conversational AI Logs
Pith reviewed 2026-05-25 02:42 UTC · model grok-4.3
The pith
Even after an LLM filter removes all explicit demographic statements, an off-the-shelf model still recovers user age, gender, and country from ChatGPT histories at weighted F1 scores of 0.84, 0.90, and 0.88.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On the filtered cohort whose conversations contain no messages flagged by an LLM-based filter for explicit demographic self-identification, an off-the-shelf large language model recovers each user's age, gender, and country at weighted F1 scores of 0.84, 0.90, and 0.88 respectively, with the median user identified from the first 5 percent of their conversation history. Reading the model's natural-language reasoning traces reveals four recurring stereotype patterns that drive both successful inference and an asymmetric error distribution. The same inference performance is competitive with the users' Google Search and YouTube histories.
What carries the argument
LLM inference from implicit conversational patterns on an explicitly filtered cohort that excludes all direct demographic self-identification
If this is right
- Message-level PII removal alone does not prevent demographic recovery from conversational histories.
- Inference from ChatGPT logs matches the performance long achieved on search and video-watch histories.
- Four recurring stereotype patterns explain both high accuracy and systematic errors on specific user groups.
- The median user can be identified from only the first five percent of their conversation history.
Where Pith is reading between the lines
- Platforms releasing conversational logs for research or training may need conversation-level or user-level redaction rather than message-level scrubbing.
- The same inference surface could be tested on other consumer LLM services to check whether the leakage pattern is model- or interface-specific.
- Error asymmetry concentrated on women in technical fields and older Global South professionals suggests downstream effects on targeted advertising or content moderation if the inferences are used operationally.
Load-bearing premise
The LLM-based filter accurately excludes every message containing direct statements of age, gender, or country so that measured performance reflects only implicit patterns.
What would settle it
A manual audit of the filtered cohort that finds any residual explicit demographic statements, or a re-run of the inference task on logs where all such statements have been hand-verified and removed and accuracy falls to chance levels.
Figures
read the original abstract
Hundreds of millions of users now hold detailed, multi-turn conversations with ChatGPT and similar LLM assistants. We measure two privacy-relevant features of these conversations on a corpus of complete ChatGPT histories donated by over 1,000 users in four Global South countries (Brazil, India, Nigeria, Pakistan). First, on explicit disclosure: 34.5% of user messages contain personal information across a twenty-category taxonomy, with the median user first revealing identifying content within the first 14% of their conversation history. Second, on inference beyond explicit disclosure: we restrict to a cohort whose conversations contain no messages flagged by an LLM-based filter for explicit demographic self-identification (a separate NER pass marks PII for the disclosure audit but does not drive cohort exclusion). On this filtered cohort, an off the shelf large language model still recovers each user's age, gender, and country at weighted F1 of 0.84, 0.90, and 0.88, respectively, with the median user identified from the first 5% of their conversation history. Reading the model's natural-language reasoning traces, we identify four recurring stereotype patterns that drive both successful inference and an asymmetric error distribution concentrating on women in technical fields, older users with contemporary skills, and Global South tech professionals. We also compare ChatGPT against the same users' Google Search and YouTube histories as inference surfaces, and find it competitive with these older substrates that have driven behavioral advertising for two decades. Message-level PII removal is insufficient on its own as a privacy intervention for conversational AI data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper empirically measures privacy leakage in a corpus of complete ChatGPT conversation histories donated by over 1,000 users in Brazil, India, Nigeria, and Pakistan. It reports that 34.5% of user messages contain personal information across a 20-category taxonomy (median first disclosure at 14% of history) and, on a cohort filtered to exclude explicit demographic self-identifications via an LLM-based filter, shows that an off-the-shelf LLM recovers age, gender, and country at weighted F1 scores of 0.84, 0.90, and 0.88 respectively, with median identification from the first 5% of history. The work also extracts recurring stereotype patterns from model traces and compares inference performance to the same users' Google Search and YouTube histories, concluding that message-level PII removal is insufficient.
Significance. If the filter is shown to be reliable, the results supply direct, reproducible evidence that conversational AI logs leak demographic attributes via implicit patterns at levels competitive with established behavioral-advertising substrates. The use of real donated multi-turn histories, an explicit 20-category taxonomy, and natural-language reasoning traces strengthens the contribution to privacy measurement in AI data pipelines.
major comments (2)
- [Abstract] Abstract (cohort restriction paragraph): the central claim that inference occurs 'beyond explicit disclosure' rests on the LLM-based filter having removed every message containing direct age/gender/country statements, yet the manuscript reports neither the prompt, accuracy on a labeled hold-out, false-negative rate, nor any post-filter manual audit of the retained cohort. Without these, the F1 scores cannot be attributed to implicit patterns.
- [Abstract] Abstract (inference results): the weighted F1 scores (0.84/0.90/0.88) and 'median user identified from the first 5%' are presented without the post-filter cohort size, confidence intervals, or any control for donation self-selection bias, which are load-bearing for interpreting the practical magnitude of the leakage.
minor comments (2)
- The description of the 20-category taxonomy and the separate NER pass for PII disclosure audit would benefit from an explicit table or appendix listing the categories and example annotations.
- The post-hoc identification of four stereotype patterns from reasoning traces is presented without inter-annotator agreement or a systematic coding scheme, which affects reproducibility of that qualitative component.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review. We address each major comment below and will revise the manuscript to improve clarity and completeness.
read point-by-point responses
-
Referee: [Abstract] Abstract (cohort restriction paragraph): the central claim that inference occurs 'beyond explicit disclosure' rests on the LLM-based filter having removed every message containing direct age/gender/country statements, yet the manuscript reports neither the prompt, accuracy on a labeled hold-out, false-negative rate, nor any post-filter manual audit of the retained cohort. Without these, the F1 scores cannot be attributed to implicit patterns.
Authors: We agree that the manuscript should provide these details to support the claim of inference beyond explicit disclosure. The current version describes the filter but omits the prompt, hold-out accuracy, false-negative rate, and post-filter audit. In revision we will add the exact prompt, its accuracy on a labeled hold-out, the false-negative rate, and results from a manual audit of a sample of the retained cohort. revision: yes
-
Referee: [Abstract] Abstract (inference results): the weighted F1 scores (0.84/0.90/0.88) and 'median user identified from the first 5%' are presented without the post-filter cohort size, confidence intervals, or any control for donation self-selection bias, which are load-bearing for interpreting the practical magnitude of the leakage.
Authors: We will add the post-filter cohort size and bootstrap confidence intervals for the F1 scores to the abstract and main text. Self-selection bias cannot be fully controlled given the voluntary donation design; we will add an explicit discussion of this limitation while preserving the core empirical finding on inferential leakage. revision: partial
Circularity Check
No circularity: direct empirical measurements on external data
full rationale
The paper reports measured inference performance (weighted F1 scores) obtained by applying off-the-shelf LLMs to a filtered subset of user-donated conversation histories. No equations, fitted parameters, or derivations are present that would reduce these F1 values to quantities defined by the study itself. Cohort filtering and PII marking are described as preprocessing steps whose outputs are then evaluated externally; the central results remain independent measurements rather than self-referential constructions. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support the reported numbers.
Axiom & Free-Parameter Ledger
free parameters (1)
- 20-category personal information taxonomy
axioms (1)
- domain assumption LLM-based filter correctly identifies and removes all explicit demographic self-identification
Reference graph
Works this paper leans on
-
[1]
InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 298–306
Persistent Anti- Muslim Bias in Large Language Models. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 298–306. Bender, E. M.; Gebru, T.; McMillan-Major, A.; and Shmitchell, S
work page 2021
-
[2]
Cao, B.; Wen, C.; Scherr, S.; Kobayashi, T.; and Jiang, L
On the Dangers of Stochastic Par- rots: Can Language Models Be Too Big? InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 610–623. Cao, B.; Wen, C.; Scherr, S.; Kobayashi, T.; and Jiang, L. C
work page 2021
-
[3]
Can LLMs Infer Conversational Agent Users' Personality Traits from Chat History?
Can LLMs Infer Conversational Agent Users’ Personality Traits from Chat History?arXiv preprint arXiv:2604.19785. Dash, A.; Das, S.; Kirsten, E.; Wu, Q.; Karnam, S. K.; Gum- madi, K. P.; Holz, T.; Zafar, M. B.; and Zannettou, S
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Dou, Y .; Krsek, I.; Naous, T.; Kabra, A.; Das, S.; Ritter, A.; and Xu, W
The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT.arXiv preprint arXiv:2602.01450. Dou, Y .; Krsek, I.; Naous, T.; Kabra, A.; Das, S.; Ritter, A.; and Xu, W
-
[5]
Mireshghallah, N.; Antoniak, M.; More, Y .; Choi, Y .; and Farnadi, G
Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Sys- tems.arXiv preprint arXiv:2602.01114. Mireshghallah, N.; Antoniak, M.; More, Y .; Choi, Y .; and Farnadi, G
-
[6]
arXiv preprint arXiv:2407.11438 , year=
Trust no bot: Discovering personal disclo- sures in human-llm conversations in the wild.arXiv preprint arXiv:2407.11438. Mohamed, S.; Png, M.-T.; and Isaac, W
-
[7]
Re-imagining Algorithmic Fairness in India and Beyond. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 315–328. Schwartz, H. A.; Eichstaedt, J. C.; Kern, M. L.; Dziurzynski, L.; Ramones, S. M.; Agrawal, M.; Shah, A.; Kosinski, M.; Stillwell, D.; Seligman, M. E. P.; and Ungar, L. H
work page 2021
-
[8]
The Woman Worked as a Babysitter: On Biases in Language Generation. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing (EMNLP), 3407–3412. Staab, R.; Vero, M.; Balunovi ´c, M.; and Vechev, M
work page 2019
-
[9]
Beyond memorization: Violating privacy via inference with large language models.arXiv preprint arXiv:2310.07298. Staufer, D.; and Morehouse, K
-
[10]
Turpin, M.; Michael, J.; Perez, E.; and Bowman, S
What Do LLMs Asso- ciate with Your Name? A Human-Centered Black-Box Au- dit of Personal Data.arXiv preprint arXiv:2602.17483. Turpin, M.; Michael, J.; Perez, E.; and Bowman, S. R
-
[11]
Language Models Don’t Always Say What They Think: Un- faithful Explanations in Chain-of-Thought Prompting. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS). A Appendix A.1 Model selection and validation For each of the three LLM-driven classification tasks in the paper (SAFE/UNSAFE filtering, twenty-category disclos...
work page 2026
-
[12]
and gender (Table 10). Religion, education level, monthly income, and voting preference appear only at the Table 7: Cross-country distribution of flagged messages across the twenty disclosure categories. Columns sum to 100% within country. Category Brazil (%) India (%) Nigeria (%) Pakistan (%) Job and education 25.54 27.05 17.07 31.78 Lifestyle and habits...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.