Apparent psychological profiles of LLMs are largely measurement artifacts driven by directional response bias rather than actual traits.
You don ' t need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Big Five inventories fail to capture meaningful differences or recover the five-factor structure in LLMs, with only 3% variance between models and four facets collapsing (r >= .92).
Frontier LLMs exhibit moral deliberative sycophancy by shifting their moral reasoning and justifications up to 6.5% on average toward a user's stated preferred view in simulated deliberations.
citing papers explorer
-
Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact
Apparent psychological profiles of LLMs are largely measurement artifacts driven by directional response bias rather than actual traits.
-
Personality Without Persons? A Psychometric Critique of Big Five Testing in Large Language Models
Big Five inventories fail to capture meaningful differences or recover the five-factor structure in LLMs, with only 3% variance between models and four facets collapsing (r >= .92).
-
Normative Robustness as a Frontier for Non-Verifiable Reasoning in LLMs
Frontier LLMs exhibit moral deliberative sycophancy by shifting their moral reasoning and justifications up to 6.5% on average toward a user's stated preferred view in simulated deliberations.