Personality Without Persons? A Psychometric Critique of Big Five Testing in Large Language Models

Anna Korhonen; Cristina Cachero; Kim Zierahn; Nuria Oliver

arxiv: 2607.02325 · v1 · pith:GYFVGXXTnew · submitted 2026-07-02 · 💻 cs.HC

Personality Without Persons? A Psychometric Critique of Big Five Testing in Large Language Models

Kim Zierahn , Cristina Cachero , Anna Korhonen , Nuria Oliver This is my paper

Pith reviewed 2026-07-03 05:57 UTC · model grok-4.3

classification 💻 cs.HC

keywords Big Fivepersonality inventorieslarge language modelspsychometricscontent validityfactor analysisAI evaluationalignment training

0 comments

The pith

Big Five personality inventories do not measure an equivalent construct in large language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper systematically tests whether human Big Five personality inventories can validly characterize LLMs. Adapted items reach acceptable content validity, yet scores across 244 models from 49 families show almost no meaningful variation between systems. Factor analysis fails to recover the expected five-factor structure, with four facets collapsing together at correlations above .92, and instruction tuning shifts responses toward socially desirable poles. These results indicate that human personality frameworks produce misleading characterizations when applied to LLMs for benchmarking or governance.

Core claim

Big Five inventories adapted for LLMs reach sufficient content validity, but when administered to 244 models they capture only 3 percent of total score variance between models and fail to recover the five-factor structure, with four of the five facets collapsing into a single dimension at r greater than or equal to .92; direct base-versus-instruction-tuned comparisons show alignment training systematically moves scores toward socially desirable traits.

What carries the argument

Psychometric evaluation of content validity followed by administration of the highest-validity inventory and confirmatory factor analysis on LLM response patterns.

If this is right

Big Five scores cannot be used to benchmark or compare LLMs because they explain almost none of the between-model differences.
Four of the five Big Five facets behave as a single dimension in LLM responses.
Alignment training produces consistent shifts in measured traits toward socially desirable responses.
Governance or safety claims based on Big Five LLM profiles rest on an unvalidated human construct.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New trait inventories built from LLM response distributions rather than human item pools would be required before personality-like constructs can support cross-model comparisons.
The collapse of facets may reflect the statistical regularities of next-token prediction rather than any internal motivational structure.
Repeated use of human inventories risks entrenching anthropomorphic assumptions in AI evaluation standards.

Load-bearing premise

That content-valid adapted items plus standard factor analysis on LLM responses suffice to conclude the inventories do not measure a human-equivalent personality construct.

What would settle it

Administering a new set of LLM-native items that recover five orthogonal factors with high inter-model variance and no collapse under the same 244-model sample would falsify the claim.

Figures

Figures reproduced from arXiv: 2607.02325 by Anna Korhonen, Cristina Cachero, Kim Zierahn, Nuria Oliver.

**Figure 2.** Figure 2: OCEAN score distributions and QQ-plots across all [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Covariance heatmap retrieved from CFA with a [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Big Five score distributions by model parameter sc [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: OCEAN trait profiles by model family, expressed [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Scree plot and parallel analysis for EFA. Blue [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Big Five score distributions by reasoning capabil [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Big Five score across release dates. OLS regressio [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Big Five score across model parameter scale for ope [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Big Five score distributions by country of model o [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Big Five score distributions by model family. Kru [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

read the original abstract

Human personality inventories are increasingly used to characterize large language models (LLMs), compare systems, and inform downstream governance claims. Yet, these inventories were developed and validated for humans, and it remains unclear whether they apply to LLMs. We present a systematic psychometric evaluation of Big Five personality measurements in LLMs. We ask three research questions: Do Big Five inventories a) appropriately describe LLMs, b) capture inter-individual differences across models, and c) reflect internal factors consistent with human personality. We assess content validity of five candidate Big Five inventories and administer the winning inventory to N = 244 different models spanning 49 model families. First, we found that Big Five items adapted for LLMs can reach sufficient content validity, while original human-developed items did not. Second, Big Five inventories did not capture meaningful differences between LLMs: We found low variability between models, accounting for only 3% of total score variance. Third, LLMs responses did not recover the Big Five five-factor structure with four of the Big Five facets collapsing into one (r >= .92). Direct comparisons between base and instruction-tuned model variants suggested that alignment training systematically shifted Big Five scores toward socially desirable traits. These findings demonstrate that Big Five scores do not measure a construct equivalent to human personality in LLMs. Applying human personality frameworks to LLMs produces misleading characterizations used to benchmark, compare, and govern LLMs. We highlight the need for evaluation frameworks that are developed for LLMs, rather than adopting human constructs without validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The 3% between-model variance makes the factor collapse unsurprising on statistical grounds, so it does not strongly support the claim that Big Five tests fail to capture an equivalent construct.

read the letter

The main thing to know is that the reported 3% between-model variance undercuts the factor analysis result. With so little variation across the 244 models, high correlations and collapsed factors are what you'd expect statistically, even if the items were tapping distinct traits. That makes it hard to conclude that the inventories don't measure an equivalent construct.

The paper does a good job with the content validity assessment. They checked five inventories and found that adapted items can reach sufficient validity for LLMs, while the original human items do not. That's a useful distinction. Running the test on 244 models from 49 families gives a broad view, and the numbers on variance and the alignment shift toward socially desirable traits are concrete and worth noting. The direct comparison between base and instruction-tuned models is also a nice touch.

The soft spot is exactly that variance issue. The central claim rests on the non-recovery of the five-factor structure, but the stress-test note is right: low range means you can't distinguish absence of structure from lack of differentiation. The paper would be stronger if it had checked or discussed whether the sample has enough spread to test the structure properly. Also, the abstract mentions concrete numbers but the full methods details aren't clear from what's here, which leaves some uncertainty.

This work is aimed at researchers in LLM evaluation who are using or considering personality inventories for benchmarking and governance. A reader looking for evidence on whether these tests transfer would find the scale and the content validity section valuable. The variance and shift findings are solid enough to cite as observations.

I would recommend sending it for peer review. The data collection is substantial and the topic is timely, even though the interpretation of the factor result has this limitation that needs addressing in revision.

Referee Report

2 major / 1 minor

Summary. The paper conducts a psychometric evaluation of Big Five inventories on LLMs, addressing three questions on content validity, inter-model differences, and internal factor structure. Using N=244 models across 49 families, it reports that adapted items achieve content validity but original items do not; between-model variance accounts for only 3% of total score variance; LLM responses fail to recover the five-factor structure (four facets collapse with r >= .92); and alignment training shifts scores toward socially desirable traits. The authors conclude that Big Five scores do not measure an equivalent construct to human personality in LLMs and should not be used for benchmarking or governance without LLM-specific validation.

Significance. If the central empirical findings hold after addressing methodological concerns, the work provides a valuable cautionary demonstration against uncritical transfer of human psychometric tools to LLMs. The broad sampling across model families and direct comparison of base vs. instruction-tuned variants strengthen the case for reevaluating current practices in AI personality assessment. The study is a direct measurement effort without circular derivations, and its emphasis on developing LLM-native evaluation frameworks addresses a timely gap in the field.

major comments (2)

[Abstract / Results (research question c)] Abstract and results on research question (c): the claim that non-recovery of the five-factor structure demonstrates absence of an equivalent construct is undermined by the reported low between-model variance (only 3% of total score variance). Treating the 244 models as 'individuals' for factor analysis in a sample with such restricted range makes high inter-facet correlations (r >= .92) and facet collapse the statistically expected outcome even if the items tap distinct latent dimensions; the non-recovery cannot be unambiguously attributed to lack of the human-like construct rather than homogeneity of the model sample.
[Methods] Methods: the manuscript lacks full reporting of statistical procedures for the factor analysis, details on model sampling and selection criteria for the N=244 models, and error-bar or uncertainty quantification on the reported variance percentages and correlations. These omissions limit evaluation of whether the 3% between-model variance and factor results are robust.

minor comments (1)

[Abstract] Abstract: the phrasing 'LLMs responses did not recover' should be 'LLM responses did not recover' for grammatical consistency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point by point below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract / Results (research question c)] Abstract and results on research question (c): the claim that non-recovery of the five-factor structure demonstrates absence of an equivalent construct is undermined by the reported low between-model variance (only 3% of total score variance). Treating the 244 models as 'individuals' for factor analysis in a sample with such restricted range makes high inter-facet correlations (r >= .92) and facet collapse the statistically expected outcome even if the items tap distinct latent dimensions; the non-recovery cannot be unambiguously attributed to lack of the human-like construct rather than homogeneity of the model sample.

Authors: We agree that range restriction is a relevant statistical consideration when interpreting the factor-analytic results. The 3% between-model variance is itself a primary empirical result demonstrating that the inventories fail to capture meaningful differences across LLMs. We will revise the abstract and the discussion of research question (c) to explicitly note the potential contribution of restricted range to the observed facet collapse (r >= .92) while maintaining that the combination of negligible inter-model variance and failure to recover the expected structure constitutes evidence against construct equivalence. We will add a dedicated limitations paragraph addressing range restriction and its implications for factor recovery. This is a partial revision, as we refine the framing but do not alter the core conclusions. revision: partial
Referee: [Methods] Methods: the manuscript lacks full reporting of statistical procedures for the factor analysis, details on model sampling and selection criteria for the N=244 models, and error-bar or uncertainty quantification on the reported variance percentages and correlations. These omissions limit evaluation of whether the 3% between-model variance and factor results are robust.

Authors: We thank the referee for identifying these reporting omissions. In the revised manuscript we will expand the Methods section to provide: (1) complete specification of the factor analysis (extraction method, rotation, and factor-retention criteria); (2) explicit sampling criteria, sources, and inclusion rules for the 244 models across 49 families; and (3) uncertainty estimates (e.g., bootstrap confidence intervals) for the variance-component percentages and inter-facet correlations. These additions will enable readers to evaluate the robustness of the reported results. revision: yes

Circularity Check

0 steps flagged

Empirical measurement study with no circular derivations or self-referential reductions

full rationale

The paper performs direct empirical measurements: content validity assessment of inventories, administration to N=244 models, variance decomposition (reporting 3% between-model variance), and standard factor analysis on item responses. No equations, predictions, or fitted parameters are presented as independent results; all reported quantities are computed directly from the LLM response data. No self-citations are load-bearing for the central claims, and no uniqueness theorems or ansatzes are imported. The analysis is self-contained against external psychometric benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the applicability of human psychometric validation techniques and factor-analytic assumptions to LLM text outputs without additional justification for non-human subjects.

axioms (1)

domain assumption Standard psychometric procedures (content validity assessment and exploratory factor analysis) remain valid when applied to LLM-generated responses.
Invoked throughout the methods and results description in the abstract.

pith-pipeline@v0.9.1-grok · 5813 in / 1171 out tokens · 32071 ms · 2026-07-03T05:57:05.614346+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 22 canonical work pages

[1]

West , title =

Sandra Peter and Kai Riemer and Jevin D. West , title =. Proceedings of the National Academy of Sciences , volume =. 2025 , doi =

2025
[2]

and Whiteman, Martha C

Matthews, Gerald and Deary, Ian J. and Whiteman, Martha C. , year =. Applications of personality assessment , booktitle =
[3]

1986 , note =

Personality stability and its implications for clinical psychology , journal =. 1986 , note =. doi:https://doi.org/10.1016/0272-7358(86)90029-2 , url =

work page doi:10.1016/0272-7358(86)90029-2 1986
[4]

Computational Linguistics , volume =

Zheng, Jingyao and Wang, Xian and Hosio, Simo and Xu, Xiaoxian and Lee, Lik-Hang , title =. Computational Linguistics , volume =. 2025 , month =. doi:10.1162/coli_a_00550 , url =

work page doi:10.1162/coli_a_00550 2025
[5]

2023 , eprint =

Evaluating and Inducing Personality in Pre-trained Language Models , author =. 2023 , eprint =

2023
[6]

PNAS Nexus , volume =

Salecha, Aadesh and Ireland, Molly E and Subrahmanya, Shashanka and Sedoc, João and Ungar, Lyle H and Eichstaedt, Johannes C , title =. PNAS Nexus , volume =. 2024 , month =. doi:10.1093/pnasnexus/pgae533 , url =

work page doi:10.1093/pnasnexus/pgae533 2024
[7]

2025 , eprint =

Designing AI-Agents with Personalities: A Psychometric Approach , author =. 2025 , eprint =

2025
[8]

2025 , eprint =

Exploring the Potential of Large Language Models to Simulate Personality , author =. 2025 , eprint =

2025
[9]

2023 , eprint =

Estimating the Personality of White-Box Language Models , author =. 2023 , eprint =

2023
[10]

2025 , eprint =

Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need , author =. 2025 , eprint =

2025
[11]

2025 , eprint =

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics , author =. 2025 , eprint =

2025
[12]

Large Language Models for Scientific and Societal Advances , year =

Evaluating Large Language Models with Psychometrics , author =. Large Language Models for Scientific and Societal Advances , year =
[13]

2024 , eprint =

Challenging the Validity of Personality Tests for Large Language Models , author =. 2024 , eprint =

2024
[14]

Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics , url =

Romero, Peter and Fitz, Stephen and Nakatsuma, Teruo , year =. Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics , url =. doi:10.21203/rs.3.rs-2717108/v1 , publisher =

work page doi:10.21203/rs.3.rs-2717108/v1
[15]

2025 , eprint =

Persistent Instability in LLM's Personality Measurements: Effects of Scale, Reasoning, and Conversation History , author =. 2025 , eprint =

2025
[16]

2025 , eprint =

Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models , author =. 2025 , eprint =

2025
[17]

Journal of Personality , author =

An. Journal of Personality , author =. 1992 , pages =. doi:10.1111/j.1467-6494.1992.tb00970.x , language =

work page doi:10.1111/j.1467-6494.1992.tb00970.x 1992
[18]

, year =

McAdams, Dan P. , year =. The emergence of personality , isbn =. Handbook of personality development , publisher =
[19]

and Butcher, James N

Ben-Porath, Yossef S. and Butcher, James N. , editor =. The Historical Development of Personality Assessment , bookTitle =. 1991 , publisher =. doi:10.1007/978-1-4757-9715-2_5 , url =

work page doi:10.1007/978-1-4757-9715-2_5 1991
[20]

2017 , publisher =

Personality Psychology: Domains of Knowledge about Human Nature , author =. 2017 , publisher =

2017
[21]

2003 , publisher =

Personality Traits , author =. 2003 , publisher =

2003
[22]

Human Behavior and Emerging Technologies , volume =

Rutinowski, Jérôme and Franke, Sven and Endendyk, Jan and Dormuth, Ina and Roidl, Moritz and Pauly, Markus , title =. Human Behavior and Emerging Technologies , volume =. doi:https://doi.org/10.1155/2024/7115633 , url =

work page doi:10.1155/2024/7115633 2024
[23]

2024 , eprint =

Revisiting the Reliability of Psychological Scales on Large Language Models , author =. 2024 , eprint =

2024
[24]

and Bojić, Ljubiša , title =

Bodroža, Bojana and Dinić, Bojana M. and Bojić, Ljubiša , title =. Royal Society Open Science , volume =. 2024 , doi =

2024
[25]

2024 , eprint =

LLMs Simulate Big Five Personality Traits: Further Evidence , author =. 2024 , eprint =

2024
[26]

2022 , eprint =

Discovering Language Model Behaviors with Model-Written Evaluations , author =. 2022 , eprint =

2022
[27]

2024 , eprint =

Eliciting Personality Traits in Large Language Models , author =. 2024 , eprint =

2024
[28]

How Personality Traits Shape

Hartley, John and Hamill, Conor Brian and Seddon, Dale and Batra, Devesh and Okhrati, Ramin and Khraishi, Raad , editor =. How Personality Traits Shape. Findings of the Association for Computational Linguistics: ACL 2025 , month = jul, year =. doi:10.18653/v1/2025.findings-acl.1085 , pages =

work page doi:10.18653/v1/2025.findings-acl.1085 2025
[29]

Lechner and Claudia Wagner and Beatrice Rammstedt and Markus Strohmaier , title =

Max Pellert and Clemens M. Lechner and Claudia Wagner and Beatrice Rammstedt and Markus Strohmaier , title =. Perspectives on Psychological Science , volume =. 2024 , doi =

2024
[30]

2025 , eprint =

AIPsychoBench: Understanding the Psychometric Differences between LLMs and Humans , author =. 2025 , eprint =

2025
[31]

Shu, Bangzhao and Zhang, Lechen and Choi, Minje and Dunagan, Lavinia and Logeswaran, Lajanugen and Lee, Moontae and Card, Dallas and Jurgens, David , editor =. You don. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , month = jun, year =...

work page doi:10.18653/v1/2024.naacl-long.295 2024
[32]

Developing a Personality Model for Speech-based Conversational Agents Using the Psycholexical Approach , url =

Völkel, Sarah Theres and Schödel, Ramona and Buschek, Daniel and Stachl, Clemens and Winterhalter, Verena and Bühner, Markus and Hussmann, Heinrich , year =. Developing a Personality Model for Speech-based Conversational Agents Using the Psycholexical Approach , url =. doi:10.1145/3313831.3376210 , booktitle =

work page doi:10.1145/3313831.3376210
[33]

The Personality Dimensions GPT-3 Expresses During Human-Chatbot Interactions , year =

Kova. The Personality Dimensions GPT-3 Expresses During Human-Chatbot Interactions , year =. doi:10.1145/3659626 , journal =

work page doi:10.1145/3659626
[34]

, title =

Goldberg, Lewis R. , title =. Psychological Assessment , year =. doi:10.1037/1040-3590.4.1.26 , publisher =

work page doi:10.1037/1040-3590.4.1.26
[35]

2025 , eprint =

Personality Traits in Large Language Models , author =. 2025 , eprint =

2025
[36]

Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024) , month = mar, year =

"LLM" Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models , author =. Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024) , month = mar, year =

2024
[37]

2025 , eprint =

The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs , author =. 2025 , eprint =

2025
[38]

2024 , eprint =

Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality , author =. 2024 , eprint =

2024
[39]

Self-report

Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots , author =. 2025 , eprint =

2025
[40]

2024 , eprint =

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits , author =. 2024 , eprint =

2024
[41]

2024 , eprint =

Neuron-based Personality Trait Induction in Large Language Models , author =. 2024 , eprint =

2024
[42]

2023 , eprint =

Large Language Models as Superpositions of Cultural Perspectives , author =. 2023 , eprint =

2023
[43]

Chameleon

Xing, Jane and Niu, Tianyi and Srivastava, Shashank , editor =. Chameleon. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =. doi:10.18653/v1/2025.emnlp-main.875 , pages =

work page doi:10.18653/v1/2025.emnlp-main.875 2025
[44]

description of personality

Goldberg, Lewis R. , year =. An alternative "description of personality":. Journal of Personality and Social Psychology , publisher =. doi:10.1037/0022-3514.59.6.1216 , number =

work page doi:10.1037/0022-3514.59.6.1216
[45]

2025 , eprint =

PsyPlay: Personality-Infused Role-Playing Conversational Agents , author =. 2025 , eprint =

2025
[46]

2025 , eprint =

BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data , author =. 2025 , eprint =

2025
[47]

2025 , eprint =

PersLLM: A Personified Training Approach for Large Language Models , author =. 2025 , eprint =

2025
[48]

2024 , eprint =

PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits , author =. 2024 , eprint =

2024
[49]

Frontiers in Psychology , VOLUME =

Sartori, Giuseppe and Orrù, Graziella , TITLE =. Frontiers in Psychology , VOLUME =. 2023 , URL =. doi:10.3389/fpsyg.2023.1279317 , ISSN =

work page doi:10.3389/fpsyg.2023.1279317 2023
[50]

Assessing the Impact of Chatbot-Human Personality Congruence on User Behavior: A Chatbot-Based Advising System Case , year =

Kuhail, Mohammad Amin and Bahja, Mohamed and Al-Shamaileh, Ons and Thomas, Justin and Alkazemi, Amina and Negreiros, Joao , journal =. Assessing the Impact of Chatbot-Human Personality Congruence on User Behavior: A Chatbot-Based Advising System Case , year =
[51]

2024 , eprint =

The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents , author =. 2024 , eprint =

2024
[52]

Chatbots With Attitude: Enhancing Chatbot Interactions Through Dynamic Personality Infusion , year =

Kova. Chatbots With Attitude: Enhancing Chatbot Interactions Through Dynamic Personality Infusion , year =. doi:10.1145/3640794.3665543 , booktitle =

work page doi:10.1145/3640794.3665543
[53]

2022 , isbn =

Moilanen, Joonas and Visuri, Aku and Suryanarayana, Sharadhi Alape and Alorwu, Andy and Yatani, Koji and Hosio, Simo , title =. 2022 , isbn =. doi:10.1145/3568444.3568464 , booktitle =

work page doi:10.1145/3568444.3568464 2022
[54]

2024 , isbn =

Lee, Jungjae and Choi, Yubin and Song, Minhyuk and Park, Sanghyun , title =. 2024 , isbn =. doi:10.1145/3640794.3665572 , booktitle =

work page doi:10.1145/3640794.3665572 2024
[55]

Personality-Matched AI Chatbots: Measuring User Experience Based on Extraversion Scores , year =

S. Personality-Matched AI Chatbots: Measuring User Experience Based on Extraversion Scores , year =
[56]

2026 , eprint=

Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Systems , author=. 2026 , eprint=

2026
[57]

Journal of social issues , volume=

Machines and mindlessness: Social responses to computers , author=. Journal of social issues , volume=. 2000 , publisher=

2000
[58]

INTERACT'93 and CHI'93 conference companion on Human factors in computing systems , pages=

Anthropomorphism, agency, and ethopoeia: computers as social actors , author=. INTERACT'93 and CHI'93 conference companion on Human factors in computing systems , pages=
[59]

Ai & Society , volume=

The quest for appropriate models of human-likeness: anthropomorphism in media equation research , author=. Ai & Society , volume=. 2018 , publisher=

2018
[60]

Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency , pages=

From" AI" to Probabilistic Automation: How Does Anthropomorphization of Technical Systems Descriptions Influence Trust? , author=. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency , pages=

2024
[61]

, author=

The development of markers for the Big-Five factor structure. , author=. Psychological assessment , volume=. 1992 , publisher=

1992
[62]

Proceedings of the National Academy of Sciences , volume=

The benefits and dangers of anthropomorphic conversational agents , author=. Proceedings of the National Academy of Sciences , volume=. 2025 , publisher=

2025
[63]

Published as , year=

The Big-Five trait taxonomy: History, measurement, and theoretical perspectives , author=. Published as , year=
[64]

description of personality

An alternative “description of personality”: The Big-Five factor structure , author=. Personality and personality disorders , pages=. 2013 , publisher=

2013
[65]

, author=

Universal features of personality traits from the observer's perspective: data from 50 cultures. , author=. Journal of personality and social psychology , volume=. 2005 , publisher=

2005
[66]

2003 , series =

Chapter 5 - Computers as persuasive social actors , booktitle =. 2003 , series =. doi:https://doi.org/10.1016/B978-155860643-2/50007-X , url =

work page doi:10.1016/b978-155860643-2/50007-x 2003
[67]

2024 , eprint =

Evaluating Psychological Safety of Large Language Models , author =. 2024 , eprint =

2024
[68]

2024 , url =

Python Language Reference , version =. 2024 , url =

2024
[69]

van Rossum, Guido , title =
[70]

and Srivastava, Sanjay , title =

John, Oliver P. and Srivastava, Sanjay , title =. Handbook of Personality: Theory and Research , editor =. 1999 , publisher =

1999
[71]

Nursing research , volume=

Determination and quantification of content validity , author=. Nursing research , volume=. 1986 , publisher=

1986
[72]

Research in Nursing & Health , author =

Is the. Research in Nursing & Health , author =. 2007 , pages =. doi:10.1002/nur.20199 , language =

work page doi:10.1002/nur.20199 2007
[73]

2014 , publisher =

Gwet, Kilem Li , title =. 2014 , publisher =

2014
[74]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch , journal =. The Measurement of Observer Agreement for Categorical Data , urldate =
[75]

Schumacker, R. E. and Lomax, R. G. , title =
[76]

Fitting Linear Mixed-Effects Models Using

Douglas Bates and Martin M. Fitting Linear Mixed-Effects Models Using. Journal of Statistical Software , year =
[77]

2021 , url =

R: A Language and Environment for Statistical Computing , author =. 2021 , url =

2021
[78]

McConochie , title =

William A. McConochie , title =. 2007 , note =

2007
[79]

Johnson , keywords =

John A. Johnson , keywords =. Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory: Development of the IPIP-NEO-120 , journal =. 2014 , issn =. doi:https://doi.org/10.1016/j.jrp.2014.05.003 , url =

work page doi:10.1016/j.jrp.2014.05.003 2014
[80]

Gorsuch , title =

Richard L. Gorsuch , title =. 1983 , publisher =

1983

Showing first 80 references.

[1] [1]

West , title =

Sandra Peter and Kai Riemer and Jevin D. West , title =. Proceedings of the National Academy of Sciences , volume =. 2025 , doi =

2025

[2] [2]

and Whiteman, Martha C

Matthews, Gerald and Deary, Ian J. and Whiteman, Martha C. , year =. Applications of personality assessment , booktitle =

[3] [3]

1986 , note =

Personality stability and its implications for clinical psychology , journal =. 1986 , note =. doi:https://doi.org/10.1016/0272-7358(86)90029-2 , url =

work page doi:10.1016/0272-7358(86)90029-2 1986

[4] [4]

Computational Linguistics , volume =

Zheng, Jingyao and Wang, Xian and Hosio, Simo and Xu, Xiaoxian and Lee, Lik-Hang , title =. Computational Linguistics , volume =. 2025 , month =. doi:10.1162/coli_a_00550 , url =

work page doi:10.1162/coli_a_00550 2025

[5] [5]

2023 , eprint =

Evaluating and Inducing Personality in Pre-trained Language Models , author =. 2023 , eprint =

2023

[6] [6]

PNAS Nexus , volume =

Salecha, Aadesh and Ireland, Molly E and Subrahmanya, Shashanka and Sedoc, João and Ungar, Lyle H and Eichstaedt, Johannes C , title =. PNAS Nexus , volume =. 2024 , month =. doi:10.1093/pnasnexus/pgae533 , url =

work page doi:10.1093/pnasnexus/pgae533 2024

[7] [7]

2025 , eprint =

Designing AI-Agents with Personalities: A Psychometric Approach , author =. 2025 , eprint =

2025

[8] [8]

2025 , eprint =

Exploring the Potential of Large Language Models to Simulate Personality , author =. 2025 , eprint =

2025

[9] [9]

2023 , eprint =

Estimating the Personality of White-Box Language Models , author =. 2023 , eprint =

2023

[10] [10]

2025 , eprint =

Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need , author =. 2025 , eprint =

2025

[11] [11]

2025 , eprint =

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics , author =. 2025 , eprint =

2025

[12] [12]

Large Language Models for Scientific and Societal Advances , year =

Evaluating Large Language Models with Psychometrics , author =. Large Language Models for Scientific and Societal Advances , year =

[13] [13]

2024 , eprint =

Challenging the Validity of Personality Tests for Large Language Models , author =. 2024 , eprint =

2024

[14] [14]

Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics , url =

Romero, Peter and Fitz, Stephen and Nakatsuma, Teruo , year =. Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics , url =. doi:10.21203/rs.3.rs-2717108/v1 , publisher =

work page doi:10.21203/rs.3.rs-2717108/v1

[15] [15]

2025 , eprint =

Persistent Instability in LLM's Personality Measurements: Effects of Scale, Reasoning, and Conversation History , author =. 2025 , eprint =

2025

[16] [16]

2025 , eprint =

Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models , author =. 2025 , eprint =

2025

[17] [17]

Journal of Personality , author =

An. Journal of Personality , author =. 1992 , pages =. doi:10.1111/j.1467-6494.1992.tb00970.x , language =

work page doi:10.1111/j.1467-6494.1992.tb00970.x 1992

[18] [18]

, year =

McAdams, Dan P. , year =. The emergence of personality , isbn =. Handbook of personality development , publisher =

[19] [19]

and Butcher, James N

Ben-Porath, Yossef S. and Butcher, James N. , editor =. The Historical Development of Personality Assessment , bookTitle =. 1991 , publisher =. doi:10.1007/978-1-4757-9715-2_5 , url =

work page doi:10.1007/978-1-4757-9715-2_5 1991

[20] [20]

2017 , publisher =

Personality Psychology: Domains of Knowledge about Human Nature , author =. 2017 , publisher =

2017

[21] [21]

2003 , publisher =

Personality Traits , author =. 2003 , publisher =

2003

[22] [22]

Human Behavior and Emerging Technologies , volume =

Rutinowski, Jérôme and Franke, Sven and Endendyk, Jan and Dormuth, Ina and Roidl, Moritz and Pauly, Markus , title =. Human Behavior and Emerging Technologies , volume =. doi:https://doi.org/10.1155/2024/7115633 , url =

work page doi:10.1155/2024/7115633 2024

[23] [23]

2024 , eprint =

Revisiting the Reliability of Psychological Scales on Large Language Models , author =. 2024 , eprint =

2024

[24] [24]

and Bojić, Ljubiša , title =

Bodroža, Bojana and Dinić, Bojana M. and Bojić, Ljubiša , title =. Royal Society Open Science , volume =. 2024 , doi =

2024

[25] [25]

2024 , eprint =

LLMs Simulate Big Five Personality Traits: Further Evidence , author =. 2024 , eprint =

2024

[26] [26]

2022 , eprint =

Discovering Language Model Behaviors with Model-Written Evaluations , author =. 2022 , eprint =

2022

[27] [27]

2024 , eprint =

Eliciting Personality Traits in Large Language Models , author =. 2024 , eprint =

2024

[28] [28]

How Personality Traits Shape

Hartley, John and Hamill, Conor Brian and Seddon, Dale and Batra, Devesh and Okhrati, Ramin and Khraishi, Raad , editor =. How Personality Traits Shape. Findings of the Association for Computational Linguistics: ACL 2025 , month = jul, year =. doi:10.18653/v1/2025.findings-acl.1085 , pages =

work page doi:10.18653/v1/2025.findings-acl.1085 2025

[29] [29]

Lechner and Claudia Wagner and Beatrice Rammstedt and Markus Strohmaier , title =

Max Pellert and Clemens M. Lechner and Claudia Wagner and Beatrice Rammstedt and Markus Strohmaier , title =. Perspectives on Psychological Science , volume =. 2024 , doi =

2024

[30] [30]

2025 , eprint =

AIPsychoBench: Understanding the Psychometric Differences between LLMs and Humans , author =. 2025 , eprint =

2025

[31] [31]

Shu, Bangzhao and Zhang, Lechen and Choi, Minje and Dunagan, Lavinia and Logeswaran, Lajanugen and Lee, Moontae and Card, Dallas and Jurgens, David , editor =. You don. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , month = jun, year =...

work page doi:10.18653/v1/2024.naacl-long.295 2024

[32] [32]

Developing a Personality Model for Speech-based Conversational Agents Using the Psycholexical Approach , url =

Völkel, Sarah Theres and Schödel, Ramona and Buschek, Daniel and Stachl, Clemens and Winterhalter, Verena and Bühner, Markus and Hussmann, Heinrich , year =. Developing a Personality Model for Speech-based Conversational Agents Using the Psycholexical Approach , url =. doi:10.1145/3313831.3376210 , booktitle =

work page doi:10.1145/3313831.3376210

[33] [33]

The Personality Dimensions GPT-3 Expresses During Human-Chatbot Interactions , year =

Kova. The Personality Dimensions GPT-3 Expresses During Human-Chatbot Interactions , year =. doi:10.1145/3659626 , journal =

work page doi:10.1145/3659626

[34] [34]

, title =

Goldberg, Lewis R. , title =. Psychological Assessment , year =. doi:10.1037/1040-3590.4.1.26 , publisher =

work page doi:10.1037/1040-3590.4.1.26

[35] [35]

2025 , eprint =

Personality Traits in Large Language Models , author =. 2025 , eprint =

2025

[36] [36]

Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024) , month = mar, year =

"LLM" Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models , author =. Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024) , month = mar, year =

2024

[37] [37]

2025 , eprint =

The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs , author =. 2025 , eprint =

2025

[38] [38]

2024 , eprint =

Is Self-knowledge and Action Consistent or Not: Investigating Large Language Model's Personality , author =. 2024 , eprint =

2024

[39] [39]

Self-report

Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots , author =. 2025 , eprint =

2025

[40] [40]

2024 , eprint =

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits , author =. 2024 , eprint =

2024

[41] [41]

2024 , eprint =

Neuron-based Personality Trait Induction in Large Language Models , author =. 2024 , eprint =

2024

[42] [42]

2023 , eprint =

Large Language Models as Superpositions of Cultural Perspectives , author =. 2023 , eprint =

2023

[43] [43]

Chameleon

Xing, Jane and Niu, Tianyi and Srivastava, Shashank , editor =. Chameleon. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =. doi:10.18653/v1/2025.emnlp-main.875 , pages =

work page doi:10.18653/v1/2025.emnlp-main.875 2025

[44] [44]

description of personality

Goldberg, Lewis R. , year =. An alternative "description of personality":. Journal of Personality and Social Psychology , publisher =. doi:10.1037/0022-3514.59.6.1216 , number =

work page doi:10.1037/0022-3514.59.6.1216

[45] [45]

2025 , eprint =

PsyPlay: Personality-Infused Role-Playing Conversational Agents , author =. 2025 , eprint =

2025

[46] [46]

2025 , eprint =

BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data , author =. 2025 , eprint =

2025

[47] [47]

2025 , eprint =

PersLLM: A Personified Training Approach for Large Language Models , author =. 2025 , eprint =

2025

[48] [48]

2024 , eprint =

PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits , author =. 2024 , eprint =

2024

[49] [49]

Frontiers in Psychology , VOLUME =

Sartori, Giuseppe and Orrù, Graziella , TITLE =. Frontiers in Psychology , VOLUME =. 2023 , URL =. doi:10.3389/fpsyg.2023.1279317 , ISSN =

work page doi:10.3389/fpsyg.2023.1279317 2023

[50] [50]

Assessing the Impact of Chatbot-Human Personality Congruence on User Behavior: A Chatbot-Based Advising System Case , year =

Kuhail, Mohammad Amin and Bahja, Mohamed and Al-Shamaileh, Ons and Thomas, Justin and Alkazemi, Amina and Negreiros, Joao , journal =. Assessing the Impact of Chatbot-Human Personality Congruence on User Behavior: A Chatbot-Based Advising System Case , year =

[51] [51]

2024 , eprint =

The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents , author =. 2024 , eprint =

2024

[52] [52]

Chatbots With Attitude: Enhancing Chatbot Interactions Through Dynamic Personality Infusion , year =

Kova. Chatbots With Attitude: Enhancing Chatbot Interactions Through Dynamic Personality Infusion , year =. doi:10.1145/3640794.3665543 , booktitle =

work page doi:10.1145/3640794.3665543

[53] [53]

2022 , isbn =

Moilanen, Joonas and Visuri, Aku and Suryanarayana, Sharadhi Alape and Alorwu, Andy and Yatani, Koji and Hosio, Simo , title =. 2022 , isbn =. doi:10.1145/3568444.3568464 , booktitle =

work page doi:10.1145/3568444.3568464 2022

[54] [54]

2024 , isbn =

Lee, Jungjae and Choi, Yubin and Song, Minhyuk and Park, Sanghyun , title =. 2024 , isbn =. doi:10.1145/3640794.3665572 , booktitle =

work page doi:10.1145/3640794.3665572 2024

[55] [55]

Personality-Matched AI Chatbots: Measuring User Experience Based on Extraversion Scores , year =

S. Personality-Matched AI Chatbots: Measuring User Experience Based on Extraversion Scores , year =

[56] [56]

2026 , eprint=

Bowling with ChatGPT: On the Evolving User Interactions with Conversational AI Systems , author=. 2026 , eprint=

2026

[57] [57]

Journal of social issues , volume=

Machines and mindlessness: Social responses to computers , author=. Journal of social issues , volume=. 2000 , publisher=

2000

[58] [58]

INTERACT'93 and CHI'93 conference companion on Human factors in computing systems , pages=

Anthropomorphism, agency, and ethopoeia: computers as social actors , author=. INTERACT'93 and CHI'93 conference companion on Human factors in computing systems , pages=

[59] [59]

Ai & Society , volume=

The quest for appropriate models of human-likeness: anthropomorphism in media equation research , author=. Ai & Society , volume=. 2018 , publisher=

2018

[60] [60]

Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency , pages=

From" AI" to Probabilistic Automation: How Does Anthropomorphization of Technical Systems Descriptions Influence Trust? , author=. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency , pages=

2024

[61] [61]

, author=

The development of markers for the Big-Five factor structure. , author=. Psychological assessment , volume=. 1992 , publisher=

1992

[62] [62]

Proceedings of the National Academy of Sciences , volume=

The benefits and dangers of anthropomorphic conversational agents , author=. Proceedings of the National Academy of Sciences , volume=. 2025 , publisher=

2025

[63] [63]

Published as , year=

The Big-Five trait taxonomy: History, measurement, and theoretical perspectives , author=. Published as , year=

[64] [64]

description of personality

An alternative “description of personality”: The Big-Five factor structure , author=. Personality and personality disorders , pages=. 2013 , publisher=

2013

[65] [65]

, author=

Universal features of personality traits from the observer's perspective: data from 50 cultures. , author=. Journal of personality and social psychology , volume=. 2005 , publisher=

2005

[66] [66]

2003 , series =

Chapter 5 - Computers as persuasive social actors , booktitle =. 2003 , series =. doi:https://doi.org/10.1016/B978-155860643-2/50007-X , url =

work page doi:10.1016/b978-155860643-2/50007-x 2003

[67] [67]

2024 , eprint =

Evaluating Psychological Safety of Large Language Models , author =. 2024 , eprint =

2024

[68] [68]

2024 , url =

Python Language Reference , version =. 2024 , url =

2024

[69] [69]

van Rossum, Guido , title =

[70] [70]

and Srivastava, Sanjay , title =

John, Oliver P. and Srivastava, Sanjay , title =. Handbook of Personality: Theory and Research , editor =. 1999 , publisher =

1999

[71] [71]

Nursing research , volume=

Determination and quantification of content validity , author=. Nursing research , volume=. 1986 , publisher=

1986

[72] [72]

Research in Nursing & Health , author =

Is the. Research in Nursing & Health , author =. 2007 , pages =. doi:10.1002/nur.20199 , language =

work page doi:10.1002/nur.20199 2007

[73] [73]

2014 , publisher =

Gwet, Kilem Li , title =. 2014 , publisher =

2014

[74] [74]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch , journal =. The Measurement of Observer Agreement for Categorical Data , urldate =

[75] [75]

Schumacker, R. E. and Lomax, R. G. , title =

[76] [76]

Fitting Linear Mixed-Effects Models Using

Douglas Bates and Martin M. Fitting Linear Mixed-Effects Models Using. Journal of Statistical Software , year =

[77] [77]

2021 , url =

R: A Language and Environment for Statistical Computing , author =. 2021 , url =

2021

[78] [78]

McConochie , title =

William A. McConochie , title =. 2007 , note =

2007

[79] [79]

Johnson , keywords =

John A. Johnson , keywords =. Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory: Development of the IPIP-NEO-120 , journal =. 2014 , issn =. doi:https://doi.org/10.1016/j.jrp.2014.05.003 , url =

work page doi:10.1016/j.jrp.2014.05.003 2014

[80] [80]

Gorsuch , title =

Richard L. Gorsuch , title =. 1983 , publisher =

1983