pith. sign in

arxiv: 2604.19785 · v2 · submitted 2026-03-31 · 💻 cs.CL · cs.AI· cs.CR· cs.CY

Can LLMs Infer Conversational Agent Users' Personality Traits from Chat History?

Pith reviewed 2026-05-13 23:01 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CRcs.CY
keywords personality inferenceprivacy risksconversational agentschat historylarge language modelstrait classificationRoBERTaChatGPT logs
0
0 comments X

The pith

Fine-tuned models infer personality traits like extraversion from ChatGPT chat histories at rates above random guessing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper collects 62,090 real chat logs from 668 users and trains RoBERTa-base classifiers to predict Big Five personality traits from those interactions. In several cases the models exceed random baselines on ternary classification, with the largest gain shown for extraversion on logs involving relationships and personal reflection. A reader would care because everyday conversations with AI agents may leak stable personal attributes without any explicit query, creating avenues for influence or misuse of derived information.

Core claim

We collected actual ChatGPT logs from N=668 participants containing 62,090 individual chats and fine-tuned RoBERTa-base text classification models to infer personality traits from CA interactions. The findings show that these models achieve trait inference with accuracy (ternary classification) better than random in multiple cases. For example, for extraversion, accuracy improves by +44% relative to the baseline on interactions for relationships and personal reflection.

What carries the argument

RoBERTa-base text classification models fine-tuned on conversational chat logs to predict personality traits via ternary classification.

If this is right

  • Interactions about relationships and personal reflection carry higher risk for inferring extraversion than other use cases.
  • Privacy risks vary systematically by conversation type rather than being uniform across all chats.
  • Derived personality information can be obtained without users stating their traits directly.
  • The same logs that support normal use also enable trait inference at measurable rates above chance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Users may want to treat all exchanges with conversational agents as potentially revealing rather than private.
  • Similar inference pipelines could be tested on other attributes such as emotional state or decision biases.
  • Service providers could add on-device filtering or noise injection to reduce unintended trait leakage.
  • Policy discussions around AI privacy should include derived personal data obtained from interaction patterns.

Load-bearing premise

The collected chat logs contain sufficient and unbiased signals of personality traits, and the participants' self-reported personality labels are accurate and stable enough to serve as ground truth.

What would settle it

A replication that pairs the same chat logs with personality labels obtained independently (for example through observer ratings rather than self-report) and finds that model accuracy drops to random levels.

Figures

Figures reproduced from arXiv: 2604.19785 by Derya C\"ogendez, No\'e Zufferey, Verena Zimmermann.

Figure 1
Figure 1. Figure 1: Number of times participants shared specific personal information with ChatGPT (N=668). [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Number of times participants interacted with ChatGPT for specific use cases (N=668). [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Sensitive information, such as knowledge about an individual's personality, can be can be misused to influence behavior (e.g., via personalized messaging). To assess to what extent an individual's personality can be inferred from user interactions with LLM-based conversational agents (CAs), we analyze and quantify related privacy risks of using CAs. We collected actual ChatGPT logs from N=668 participants, containing 62,090 individual chats, and report statistics about the different types of shared data and use cases. We fine-tuned RoBERTa-base text classification models to infer personality traits from CA interactions. The findings show that these models achieve trait inference with accuracy (ternary classification) better than random in multiple cases. For example, for extraversion, accuracy improves by +44% relative to the baseline on interactions for relationships and personal reflection. This research highlights how interactions with CAs pose privacy risks and provides fine-grained insights into the level of risk associated with different types of interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that LLMs can infer users' personality traits from interactions with conversational agents. Using real ChatGPT logs collected from N=668 participants (62,090 chats), the authors fine-tune RoBERTa-base models for ternary classification of personality traits and report accuracies better than random baselines, including a +44% relative improvement for extraversion on interactions involving relationships and personal reflection. The work quantifies privacy risks associated with different types of CA usage.

Significance. If the empirical results hold after methodological clarification, the paper would offer concrete evidence of privacy vulnerabilities in LLM-based conversational agents, with fine-grained analysis of risk levels across interaction categories. The use of actual user logs rather than synthetic data is a positive aspect that could strengthen the contribution to privacy research in NLP.

major comments (2)
  1. [Abstract] Abstract: The central claim of better-than-random ternary classification accuracy (e.g., +44% relative gain for extraversion) rests on the quality of the personality labels as ground truth. The abstract supplies no information on the assessment instrument, recruitment procedure, label reliability (test-retest or behavioral validation), or potential confounds, making it impossible to determine whether reported lifts reflect conversational signals or label artifacts.
  2. [Abstract] Abstract: No description is given of the train/test split strategy, baseline construction, or statistical tests confirming that accuracies exceed chance. These omissions are load-bearing because the headline result cannot be verified or replicated without them.
minor comments (1)
  1. [Abstract] Abstract: Typographical error in the sentence 'Sensitive information, such as knowledge about an individual's personality, can be can be misused'.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive comments. We have revised the abstract to supply the requested methodological details on label construction, data partitioning, and evaluation. Point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of better-than-random ternary classification accuracy (e.g., +44% relative gain for extraversion) rests on the quality of the personality labels as ground truth. The abstract supplies no information on the assessment instrument, recruitment procedure, label reliability (test-retest or behavioral validation), or potential confounds, making it impossible to determine whether reported lifts reflect conversational signals or label artifacts.

    Authors: We agree the abstract was too terse. The revised abstract now states that traits were measured with the 44-item Big Five Inventory (BFI) via self-report, that participants were recruited through online platforms with informed consent, and that labels were obtained by tertile splits. The full manuscript already describes topic-level controls and reports consistent gains across multiple categories, which would be unlikely under pure label noise. We note that no additional test-retest or behavioral validation was collected for this sample. revision: yes

  2. Referee: [Abstract] Abstract: No description is given of the train/test split strategy, baseline construction, or statistical tests confirming that accuracies exceed chance. These omissions are load-bearing because the headline result cannot be verified or replicated without them.

    Authors: We accept the criticism. The revised abstract now indicates that an 80/20 user-stratified split was used to avoid leakage, that the random baseline reflects the majority-class proportion, and that improvements were assessed with statistical tests against chance. These details were already present in the methods and results sections; the abstract has been expanded to make them visible at a glance. revision: yes

standing simulated objections not resolved
  • Test-retest reliability or behavioral validation of the personality labels beyond standard BFI self-report administration

Circularity Check

0 steps flagged

No circularity: empirical supervised inference on collected logs and external labels

full rationale

The paper's core result is obtained by collecting 62,090 real ChatGPT logs from 668 participants, obtaining separate personality trait labels (presumably via standard self-report instruments), fine-tuning RoBERTa-base classifiers on the chat text, and measuring ternary classification accuracy against held-out data. Reported lifts such as +44% relative improvement for extraversion are direct performance metrics on this pipeline. No step redefines the target labels in terms of the model's outputs, renames a fitted parameter as a prediction, or relies on a self-citation chain to establish uniqueness or an ansatz. The derivation is therefore self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard NLP transfer learning assumptions and the empirical performance of fine-tuned models on the collected dataset; no explicit free parameters, invented entities, or non-standard axioms are stated.

axioms (1)
  • standard math RoBERTa-base can be fine-tuned for ternary text classification of personality traits from chat logs
    Standard assumption in NLP transfer learning literature.

pith-pipeline@v0.9.0 · 5478 in / 1187 out tokens · 53613 ms · 2026-05-13T23:01:32.956448+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Inferential Privacy Leakage in Anonymized Conversational AI Logs

    cs.CY 2026-05 unverdicted novelty 6.0

    LLM-based inference recovers user age, gender, and country from filtered ChatGPT logs at weighted F1 scores of 0.84-0.90, with median identification from the first 5% of history, driven by stereotype patterns.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Personality and Individ- ual Differences87, 147–152 (Dec 2015).https://doi.org/10.1016/j.paid.2015.07.037,https: //www.sciencedirect.com/science/article/pii/S0191886915004948

    Alkı¸ s, N., Ta¸ skaya Temizel, T.: The impact of individual differences on influence strategies. Personality and Individ- ual Differences87, 147–152 (Dec 2015).https://doi.org/10.1016/j.paid.2015.07.037,https: //www.sciencedirect.com/science/article/pii/S0191886915004948

  2. [2]

    In: Proc

    Arp, D., Quiring, E., Pendlebury, F., Warnecke, A., Pierazzi, F., Wressnegger, C., Cavallaro, L., Rieck, K.: Dos and Don’ts of Machine Learning in Computer Security. In: Proc. of the USENIX Security Symp. USENIX As- sociation (2020).https://doi.org/https://doi.org/10.48550/arXiv.2010.09470, zotAbbrevi- ate:no proceedingsTitle

  3. [3]

    How People Use ChatGPT

    Chatterji, A., Cunningham, T., Deming, D.J., Hitzig, Z., Ong, C., Shan, C.Y ., Wadman, K.: How people use chatgpt. Working Paper 34255, National Bureau of Economic Research (2025).https://doi.org/10.3386/w34255, http://www.nber.org/papers/w34255

  4. [4]

    Computers in Hu- man Behavior (2015).https://doi.org/10.1016/j.chb.2014.12.038,https://linkinghub

    Chorley, M.J., Whitaker, R.M., Allen, S.M.: Personality and location-based social networks. Computers in Hu- man Behavior (2015).https://doi.org/10.1016/j.chb.2014.12.038,https://linkinghub. elsevier.com/retrieve/pii/S0747563214007559

  5. [5]

    Economics Letters (1) (2012)

    Cobb-Clark, D.A., Schurer, S.: The stability of big-five personality traits. Economics Letters (1) (2012). https://doi.org/10.1016/j.econlet.2011.11.015,https://linkinghub.elsevier. com/retrieve/pii/S0165176511004666

  6. [6]

    koi.ai/blog/urban-vpn-browser-extension-ai-conversations-data-collection

    Dardikman, I.: 8 Million Users’ AI Conversations Sold for Profit by "Privacy" Extensions (2025),https://www. koi.ai/blog/urban-vpn-browser-extension-ai-conversations-data-collection

  7. [7]

    Duarte, F.: Number of ChatGPT Users (January 2026) (2026),https://explodingtopics.com/blog/ chatgpt-users

  8. [8]

    Cögendez et al

    Duhigg, C.: What Does Your Credit-Card Company Know About You? The New York Times (May 2009),https: //www.nytimes.com/2009/05/17/magazine/17credit-t.html 14 D. Cögendez et al

  9. [9]

    Egyptian Informatics Journal (1) (2022).https://doi.org/10.1016/j.eij.2021

    El-Demerdash, K., El-Khoribi, R.A., Ismail Shoman, M.A., Abdou, S.: Deep learning based fusion strategies for personality prediction. Egyptian Informatics Journal (1) (2022).https://doi.org/10.1016/j.eij.2021. 05.004,https://linkinghub.elsevier.com/retrieve/pii/S1110866521000311

  10. [10]

    Nature (2018), https://www.nature.com/articles/d41586-018-03880-4

    Gibney, E.: The scant science behind Cambridge Analytica’s controversial marketing techniques. Nature (2018), https://www.nature.com/articles/d41586-018-03880-4

  11. [11]

    Journal of Research in Personality (6) (2003).https://doi.org/10.1016/S0092-6566(03)00046-1,https: //linkinghub.elsevier.com/retrieve/pii/S0092656603000461

    Gosling, S.D., Rentfrow, P.J., Swann, W.B.: A very brief measure of the Big-Five personality domains. Journal of Research in Personality (6) (2003).https://doi.org/10.1016/S0092-6566(03)00046-1,https: //linkinghub.elsevier.com/retrieve/pii/S0092656603000461

  12. [12]

    In: 2021 Tenth International Conference on Intelligent Computing and Information Sys- tems (ICICIS)

    Hassanein, M., Rady, S., Hussein, W., Gharib, T.F.: Predicting the Big Five for social network users using their personality characteristics. In: 2021 Tenth International Conference on Intelligent Computing and Information Sys- tems (ICICIS). pp. 160–164 (Dec 2021).https://doi.org/10.1109/ICICIS52592.2021.9694160, https://ieeexplore.ieee.org/document/9694160

  13. [13]

    Psychological Science (2012).https:// doi.org/10.1177/0956797611436349,https://sci-hub.st/10.1177/0956797611436349

    Hirsh, J.B., Kang, S.K., Bodenhausen, G.V .: Personalized Persuasion. Psychological Science (2012).https:// doi.org/10.1177/0956797611436349,https://sci-hub.st/10.1177/0956797611436349

  14. [14]

    Machine Learning (3) (2014).https://doi.org/10.1007/ s10994-013-5415-y,http://link.springer.com/10.1007/s10994-013-5415-y

    Kosinski, M., Bachrach, Y ., Kohli, P., Stillwell, D., Graepel, T.: Manifestations of user personality in website choice and behaviour on online social networks. Machine Learning (3) (2014).https://doi.org/10.1007/ s10994-013-5415-y,http://link.springer.com/10.1007/s10994-013-5415-y

  15. [15]

    Neural Networks (2014).https://doi.org/10.1016/j.neunet.2014.05.020, https://linkinghub.elsevier.com/retrieve/pii/S0893608014001282

    Lima, A.C.E., de Castro, L.N.: A multi-label, semi-supervised classification approach applied to personality pre- diction in social media. Neural Networks (2014).https://doi.org/10.1016/j.neunet.2014.05.020, https://linkinghub.elsevier.com/retrieve/pii/S0893608014001282

  16. [16]

    Hoovered up as a data point

    Malki, L.M., Polamarasetty, A., Hatamian, M., Costanza, E., Warner, M.: “Hoovered up as a data point”: Exploring Privacy Behaviours, Awareness, and Concerns Among UK Users of LLM-based Conversational Agents. Proceedings on Privacy Enhancing Technologies (2025)

  17. [17]

    Journal of Personality Assessment (1) (2019).https://doi.org/10

    Maples-Keller, J.L., Williamson, R.L., Sleep, C.E., Carter, N.T., Campbell, W.K., Miller, J.D.: Using Item Re- sponse Theory to Develop a 60-Item Representation of the NEO PI–R Using the International Personality Item Pool: Development of the IPIP–NEO–60. Journal of Personality Assessment (1) (2019).https://doi.org/10. 1080/00223891.2017.1381968,https://d...

  18. [18]

    Matz, Jacob D

    Matz, S.C., Teeny, J.D., Vaid, S.S., Peters, H., Harari, G.M., Cerf, M.: The potential of generative AI for personalized persuasion at scale. Scientific Reports (1) (2024).https://doi.org/10.1038/s41598-024-53755-0, https://www.nature.com/articles/s41598-024-53755-0

  19. [19]

    Journal of Personality Assessment (3) (2005).https://doi.org/10.1207/s15327752jpa8403_05

    McCrae, R.R., Costa, Jr., P.T., Martin, T.A.: The NEO–PI–3: A More Readable Revised NEO Personality Inventory. Journal of Personality Assessment (3) (2005).https://doi.org/10.1207/s15327752jpa8403_05

  20. [20]

    McMahon, L.: Hundreds of thousands of Grok chats exposed in Google results (2025),https://www.bbc.com/ news/articles/cdrkmk00jy0o

  21. [21]

    CSCW (2011)

    Minamikawa, A., Yokoyama, H.: Blog tells what kind of personality you have: egogram estimation from Japanese weblog. CSCW (2011)

  22. [22]

    In: Natural Language Processing and Information Systems

    Molchanova, M.: Exploring the Potential of Large Language Models for Text-Based Personality Prediction. In: Natural Language Processing and Information Systems. Springer Nature Switzerland, Cham (2024),https: //link.springer.com/10.1007/978-3-031-70242-6_28

  23. [23]

    Journal of Research in Personality (2018).https://doi.org/10.1016/j.jrp.2017.12.004,https://www

    Mønsted, B., Mollgaard, A., Mathiesen, J.: Phone-based metric as a predictor for basic personality traits. Journal of Research in Personality (2018).https://doi.org/10.1016/j.jrp.2017.12.004,https://www. sciencedirect.com/science/article/pii/S0092656618300011

  24. [24]

    PNAS Nexus (6) (2024).https://doi.org/10.1093/pnasnexus/pgae231,https://academic

    Peters, H., Matz, S.C.: Large language models can infer psychological dispositions of social media users. PNAS Nexus (6) (2024).https://doi.org/10.1093/pnasnexus/pgae231,https://academic. oup.com/pnasnexus/article/doi/10.1093/pnasnexus/pgae231/7692212

  25. [25]

    PLOS ONE (3) (2015).https://doi.org/10.1371/journal.pone.0122245,https://journals.plos.org/ plosone/article?id=10.1371/journal.pone.0122245

    Rentfrow, P.J., Jokela, M., Lamb, M.E.: Regional Personality Differences in Great Britain. PLOS ONE (3) (2015).https://doi.org/10.1371/journal.pone.0122245,https://journals.plos.org/ plosone/article?id=10.1371/journal.pone.0122245

  26. [26]

    Personality and Social Psychology Bulletin (6) (2002).https://doi.org/10.1177/0146167202289008,https: //doi.org/10.1177/0146167202289008

    Roccas, S., Sagiv, L., Schwartz, S.H., Knafo, A.: The Big Five Personality Factors and Personal Values. Personality and Social Psychology Bulletin (6) (2002).https://doi.org/10.1177/0146167202289008,https: //doi.org/10.1177/0146167202289008

  27. [27]

    Journal of Cross-Cultural Psychology (2007).https://doi.org/10.1177//0022022106297299 Inferring Conversational Agent Users’ Personality Traits 15

    Schmitt, D., McCrae, R., Benet, V ., Alcalay, L., Ault, L., Austers, I., Bianchi, G., Boholst, F., Cunen, M., Braeckman, J., Jr, E., Caral, L.G., Caron, G., Casullo, M., Cunningham, M., Daibo, I., de backer, C., Zupanèiè, A.: The geographic distribution of big five personality traits: Patterns and profiles of human self-description across 56 nations. Jour...

  28. [28]

    Simo, F.: Our approach to advertising and expanding access to chatgpt (2026),https://openai.com/index/ our-approach-to-advertising-and-expanding-access/

  29. [29]

    Sims, D.: Thousands of private ChatGPT conversations found via Google search after feature mishap (2025),https://www.techspot.com/news/ 108911-thousands-private-chatgpt-conversations-found-google-search-after. html

  30. [30]

    Staab, R., Vero, M., Balunovic, M., Vechev, M.: Beyond Memorization: Violating Privacy via Inference with Large Language Models (2023),https://openreview.net/forum?id=kmn0BhQk7p

  31. [31]

    Proceedings of the National Academy of Sciences (30) (2020).https://doi.org/10.1073/ pnas.1920484117,https://www.pnas.org/content/117/30/17680

    Stachl, C., Au, Q., Schoedel, R., Gosling, S.D., Harari, G.M., Buschek, D., Völkel, S.T., Schuwerk, T., Oldemeier, M., Ullmann, T., Hussmann, H., Bischl, B., Bühner, M.: Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences (30) (2020).https://doi.org/10.1073/ pnas.1920484117,https://w...

  32. [32]

    Psychol- ogy, Health & Medicine (3) (2020).https://doi.org/10.1080/13548506.2019.1687918,https: //www.tandfonline.com/doi/full/10.1080/13548506.2019.1687918

    Weston, S.J., Edmonds, G.W., Hill, P.L.: Personality traits predict dietary habits in middle-to-older adults. Psychol- ogy, Health & Medicine (3) (2020).https://doi.org/10.1080/13548506.2019.1687918,https: //www.tandfonline.com/doi/full/10.1080/13548506.2019.1687918

  33. [33]

    Westwood, S.J., Druckman, E.J.N., Levi, M.: The potential existential threat of large language models to online survey research (2025)

  34. [34]

    Nature Human Behaviour pp

    Wright, A.G.C., Ringwald, W.R., Vize, C.E., Eichstaedt, J.C., Angstadt, M., Taxali, A., Sripada, C.: Assess- ing personality using zero-shot generative AI scoring of brief open-ended text. Nature Human Behaviour pp. 1– 15 (Jan 2026).https://doi.org/10.1038/s41562-025-02389-x,https://www.nature.com/ articles/s41562-025-02389-x

  35. [35]

    Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M., Li, M., Zhang, P., Wang, P., Zhu, Q...

  36. [36]

    Zhu, Y ., Hu, L., Ning, N., Zhang, W., Wu, B.: A lexical psycholinguistic knowledge-guided graph neural network for interpretable personality detection (2022)

  37. [37]

    In: Proceedings of the 20th Chinese National Conference on Computational Linguistics

    Zhuang, L., Wayne, L., Ya, S., Jun, Z.: A robustly optimized BERT pre-training approach with post-training. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics. Chinese Information Processing Society of China, Huhhot, China (2021),https://aclanthology.org/2021.ccl-1.108/

  38. [38]

    AI is from the devil

    Zufferey, N., Gaballah, S.A., Marky, K., Zimmermann, V .: “AI is from the devil.” Behaviors and Concerns Toward Personal Data Sharing with LLM-based Conversational Agents. Proceedings on Privacy Enhancing Technologies (2025)

  39. [39]

    openness

    Zufferey, N., Humbert, M., Tavenard, R., Huguenin, K.: Watch your Watch: Inferring Personality Traits from Wearable Activity Trackers. In: Proc. of the USENIX Security Symp. USENIX Association (2023),https: //www.usenix.org/conference/usenixsecurity23/presentation/zufferey 16 D. Cögendez et al. A Dataset A.1 User Chats total mean median std min max 62090 ...