pith. sign in

arxiv: 2605.23820 · v1 · pith:WJW36TMMnew · submitted 2026-05-22 · 💻 cs.CY · cs.SI

Inferential Privacy Leakage in Anonymized Conversational AI Logs

Pith reviewed 2026-05-25 02:42 UTC · model grok-4.3

classification 💻 cs.CY cs.SI
keywords inferential privacyconversational AIanonymizationChatGPT logsdemographic inferenceprivacy leakageGlobal South usersstereotype patterns
0
0 comments X

The pith

Even after an LLM filter removes all explicit demographic statements, an off-the-shelf model still recovers user age, gender, and country from ChatGPT histories at weighted F1 scores of 0.84, 0.90, and 0.88.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines donated ChatGPT conversation histories from over 1,000 users in Brazil, India, Nigeria, and Pakistan. It first documents that 34.5 percent of messages contain personal information from a twenty-category taxonomy, with the median user revealing identifying content by the fourteenth percentile of their history. It then restricts analysis to a cohort whose logs contain no messages flagged by an LLM filter for explicit age, gender, or country statements. On this filtered set, the same off-the-shelf LLM recovers the three demographics with the reported F1 scores, often from the first five percent of messages, by exploiting recurring stereotype patterns. The work concludes that message-level removal of personally identifiable information is insufficient to prevent demographic inference from conversational AI data.

Core claim

On the filtered cohort whose conversations contain no messages flagged by an LLM-based filter for explicit demographic self-identification, an off-the-shelf large language model recovers each user's age, gender, and country at weighted F1 scores of 0.84, 0.90, and 0.88 respectively, with the median user identified from the first 5 percent of their conversation history. Reading the model's natural-language reasoning traces reveals four recurring stereotype patterns that drive both successful inference and an asymmetric error distribution. The same inference performance is competitive with the users' Google Search and YouTube histories.

What carries the argument

LLM inference from implicit conversational patterns on an explicitly filtered cohort that excludes all direct demographic self-identification

If this is right

  • Message-level PII removal alone does not prevent demographic recovery from conversational histories.
  • Inference from ChatGPT logs matches the performance long achieved on search and video-watch histories.
  • Four recurring stereotype patterns explain both high accuracy and systematic errors on specific user groups.
  • The median user can be identified from only the first five percent of their conversation history.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platforms releasing conversational logs for research or training may need conversation-level or user-level redaction rather than message-level scrubbing.
  • The same inference surface could be tested on other consumer LLM services to check whether the leakage pattern is model- or interface-specific.
  • Error asymmetry concentrated on women in technical fields and older Global South professionals suggests downstream effects on targeted advertising or content moderation if the inferences are used operationally.

Load-bearing premise

The LLM-based filter accurately excludes every message containing direct statements of age, gender, or country so that measured performance reflects only implicit patterns.

What would settle it

A manual audit of the filtered cohort that finds any residual explicit demographic statements, or a re-run of the inference task on logs where all such statements have been hand-verified and removed and accuracy falls to chance levels.

Figures

Figures reproduced from arXiv: 2605.23820 by Kiran Garimella, S M Mehedi Zaman.

Figure 2
Figure 2. Figure 2: Cumulative count of flagged messages against [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Distribution of the discovery point Pdiscovery, the fraction of a user’s conversation history at which the first flagged message occurs. Disclosure accumulates linearly at the cohort level. Fig￾ure 2 plots the cumulative number of flagged messages against the fraction of history read, averaged across users. The relationship is approximately linear, which is to say that at the cohort-aggregate level the fla… view at source ↗
Figure 4
Figure 4. Figure 4: Confusion matrix for gender prediction on the [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrix for age-bracket prediction on the [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Confusion matrix for country prediction on the [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of context-needed (% of conversation [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of context-needed (% of conversation [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of context-needed (% of conversation [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
read the original abstract

Hundreds of millions of users now hold detailed, multi-turn conversations with ChatGPT and similar LLM assistants. We measure two privacy-relevant features of these conversations on a corpus of complete ChatGPT histories donated by over 1,000 users in four Global South countries (Brazil, India, Nigeria, Pakistan). First, on explicit disclosure: 34.5% of user messages contain personal information across a twenty-category taxonomy, with the median user first revealing identifying content within the first 14% of their conversation history. Second, on inference beyond explicit disclosure: we restrict to a cohort whose conversations contain no messages flagged by an LLM-based filter for explicit demographic self-identification (a separate NER pass marks PII for the disclosure audit but does not drive cohort exclusion). On this filtered cohort, an off the shelf large language model still recovers each user's age, gender, and country at weighted F1 of 0.84, 0.90, and 0.88, respectively, with the median user identified from the first 5% of their conversation history. Reading the model's natural-language reasoning traces, we identify four recurring stereotype patterns that drive both successful inference and an asymmetric error distribution concentrating on women in technical fields, older users with contemporary skills, and Global South tech professionals. We also compare ChatGPT against the same users' Google Search and YouTube histories as inference surfaces, and find it competitive with these older substrates that have driven behavioral advertising for two decades. Message-level PII removal is insufficient on its own as a privacy intervention for conversational AI data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper empirically measures privacy leakage in a corpus of complete ChatGPT conversation histories donated by over 1,000 users in Brazil, India, Nigeria, and Pakistan. It reports that 34.5% of user messages contain personal information across a 20-category taxonomy (median first disclosure at 14% of history) and, on a cohort filtered to exclude explicit demographic self-identifications via an LLM-based filter, shows that an off-the-shelf LLM recovers age, gender, and country at weighted F1 scores of 0.84, 0.90, and 0.88 respectively, with median identification from the first 5% of history. The work also extracts recurring stereotype patterns from model traces and compares inference performance to the same users' Google Search and YouTube histories, concluding that message-level PII removal is insufficient.

Significance. If the filter is shown to be reliable, the results supply direct, reproducible evidence that conversational AI logs leak demographic attributes via implicit patterns at levels competitive with established behavioral-advertising substrates. The use of real donated multi-turn histories, an explicit 20-category taxonomy, and natural-language reasoning traces strengthens the contribution to privacy measurement in AI data pipelines.

major comments (2)
  1. [Abstract] Abstract (cohort restriction paragraph): the central claim that inference occurs 'beyond explicit disclosure' rests on the LLM-based filter having removed every message containing direct age/gender/country statements, yet the manuscript reports neither the prompt, accuracy on a labeled hold-out, false-negative rate, nor any post-filter manual audit of the retained cohort. Without these, the F1 scores cannot be attributed to implicit patterns.
  2. [Abstract] Abstract (inference results): the weighted F1 scores (0.84/0.90/0.88) and 'median user identified from the first 5%' are presented without the post-filter cohort size, confidence intervals, or any control for donation self-selection bias, which are load-bearing for interpreting the practical magnitude of the leakage.
minor comments (2)
  1. The description of the 20-category taxonomy and the separate NER pass for PII disclosure audit would benefit from an explicit table or appendix listing the categories and example annotations.
  2. The post-hoc identification of four stereotype patterns from reasoning traces is presented without inter-annotator agreement or a systematic coding scheme, which affects reproducibility of that qualitative component.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review. We address each major comment below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract] Abstract (cohort restriction paragraph): the central claim that inference occurs 'beyond explicit disclosure' rests on the LLM-based filter having removed every message containing direct age/gender/country statements, yet the manuscript reports neither the prompt, accuracy on a labeled hold-out, false-negative rate, nor any post-filter manual audit of the retained cohort. Without these, the F1 scores cannot be attributed to implicit patterns.

    Authors: We agree that the manuscript should provide these details to support the claim of inference beyond explicit disclosure. The current version describes the filter but omits the prompt, hold-out accuracy, false-negative rate, and post-filter audit. In revision we will add the exact prompt, its accuracy on a labeled hold-out, the false-negative rate, and results from a manual audit of a sample of the retained cohort. revision: yes

  2. Referee: [Abstract] Abstract (inference results): the weighted F1 scores (0.84/0.90/0.88) and 'median user identified from the first 5%' are presented without the post-filter cohort size, confidence intervals, or any control for donation self-selection bias, which are load-bearing for interpreting the practical magnitude of the leakage.

    Authors: We will add the post-filter cohort size and bootstrap confidence intervals for the F1 scores to the abstract and main text. Self-selection bias cannot be fully controlled given the voluntary donation design; we will add an explicit discussion of this limitation while preserving the core empirical finding on inferential leakage. revision: partial

Circularity Check

0 steps flagged

No circularity: direct empirical measurements on external data

full rationale

The paper reports measured inference performance (weighted F1 scores) obtained by applying off-the-shelf LLMs to a filtered subset of user-donated conversation histories. No equations, fitted parameters, or derivations are present that would reduce these F1 values to quantities defined by the study itself. Cohort filtering and PII marking are described as preprocessing steps whose outputs are then evaluated externally; the central results remain independent measurements rather than self-referential constructions. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support the reported numbers.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The study is an empirical measurement relying on data donation and off-the-shelf LLM use rather than new theoretical constructs or derivations.

free parameters (1)
  • 20-category personal information taxonomy
    Chosen to classify explicit disclosures; details of category definitions and inter-annotator agreement not provided in abstract.
axioms (1)
  • domain assumption LLM-based filter correctly identifies and removes all explicit demographic self-identification
    Used to define the inference-only cohort; accuracy of this filter is not validated in the abstract.

pith-pipeline@v0.9.0 · 5816 in / 1365 out tokens · 33302 ms · 2026-05-25T02:42:40.793858+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

  1. [1]

    InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 298–306

    Persistent Anti- Muslim Bias in Large Language Models. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 298–306. Bender, E. M.; Gebru, T.; McMillan-Major, A.; and Shmitchell, S

  2. [2]

    Cao, B.; Wen, C.; Scherr, S.; Kobayashi, T.; and Jiang, L

    On the Dangers of Stochastic Par- rots: Can Language Models Be Too Big? InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 610–623. Cao, B.; Wen, C.; Scherr, S.; Kobayashi, T.; and Jiang, L. C

  3. [3]

    Can LLMs Infer Conversational Agent Users' Personality Traits from Chat History?

    Can LLMs Infer Conversational Agent Users’ Personality Traits from Chat History?arXiv preprint arXiv:2604.19785. Dash, A.; Das, S.; Kirsten, E.; Wu, Q.; Karnam, S. K.; Gum- madi, K. P.; Holz, T.; Zafar, M. B.; and Zannettou, S

  4. [4]

    Dou, Y .; Krsek, I.; Naous, T.; Kabra, A.; Das, S.; Ritter, A.; and Xu, W

    The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT.arXiv preprint arXiv:2602.01450. Dou, Y .; Krsek, I.; Naous, T.; Kabra, A.; Das, S.; Ritter, A.; and Xu, W

  5. [5]

    Mireshghallah, N.; Antoniak, M.; More, Y .; Choi, Y .; and Farnadi, G

    Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Sys- tems.arXiv preprint arXiv:2602.01114. Mireshghallah, N.; Antoniak, M.; More, Y .; Choi, Y .; and Farnadi, G

  6. [6]

    arXiv preprint arXiv:2407.11438 , year=

    Trust no bot: Discovering personal disclo- sures in human-llm conversations in the wild.arXiv preprint arXiv:2407.11438. Mohamed, S.; Png, M.-T.; and Isaac, W

  7. [7]

    InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 315–328

    Re-imagining Algorithmic Fairness in India and Beyond. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 315–328. Schwartz, H. A.; Eichstaedt, J. C.; Kern, M. L.; Dziurzynski, L.; Ramones, S. M.; Agrawal, M.; Shah, A.; Kosinski, M.; Stillwell, D.; Seligman, M. E. P.; and Ungar, L. H

  8. [8]

    InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing (EMNLP), 3407–3412

    The Woman Worked as a Babysitter: On Biases in Language Generation. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing (EMNLP), 3407–3412. Staab, R.; Vero, M.; Balunovi ´c, M.; and Vechev, M

  9. [9]

    Staufer, D.; and Morehouse, K

    Beyond memorization: Violating privacy via inference with large language models.arXiv preprint arXiv:2310.07298. Staufer, D.; and Morehouse, K

  10. [10]

    Turpin, M.; Michael, J.; Perez, E.; and Bowman, S

    What Do LLMs Asso- ciate with Your Name? A Human-Centered Black-Box Au- dit of Personal Data.arXiv preprint arXiv:2602.17483. Turpin, M.; Michael, J.; Perez, E.; and Bowman, S. R

  11. [11]

    I am 25",

    Language Models Don’t Always Say What They Think: Un- faithful Explanations in Chain-of-Thought Prompting. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS). A Appendix A.1 Model selection and validation For each of the three LLM-driven classification tasks in the paper (SAFE/UNSAFE filtering, twenty-category disclos...

  12. [12]

    Personal Data Type

    and gender (Table 10). Religion, education level, monthly income, and voting preference appear only at the Table 7: Cross-country distribution of flagged messages across the twenty disclosure categories. Columns sum to 100% within country. Category Brazil (%) India (%) Nigeria (%) Pakistan (%) Job and education 25.54 27.05 17.07 31.78 Lifestyle and habits...