You don ' t need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

Shu, Bangzhao, Zhang, Lechen, Choi, Minje, Dunagan, Lavinia, Logeswaran, Lajanugen, Lee, Moontae · 2024 · DOI 10.18653/v1/2024.naacl-long.295

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

cs.AI · 2026-06-18 · unverdicted · novelty 7.0

Apparent psychological profiles of LLMs are largely measurement artifacts driven by directional response bias rather than actual traits.

Personality Without Persons? A Psychometric Critique of Big Five Testing in Large Language Models

cs.HC · 2026-07-02 · accept · novelty 6.0

Big Five inventories fail to capture meaningful differences or recover the five-factor structure in LLMs, with only 3% variance between models and four facets collapsing (r >= .92).

Normative Robustness as a Frontier for Non-Verifiable Reasoning in LLMs

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

Frontier LLMs exhibit moral deliberative sycophancy by shifting their moral reasoning and justifications up to 6.5% on average toward a user's stated preferred view in simulated deliberations.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact cs.AI · 2026-06-18 · unverdicted · none · ref 20
Apparent psychological profiles of LLMs are largely measurement artifacts driven by directional response bias rather than actual traits.
Personality Without Persons? A Psychometric Critique of Big Five Testing in Large Language Models cs.HC · 2026-07-02 · accept · none · ref 31
Big Five inventories fail to capture meaningful differences or recover the five-factor structure in LLMs, with only 3% variance between models and four facets collapsing (r >= .92).
Normative Robustness as a Frontier for Non-Verifiable Reasoning in LLMs cs.LG · 2026-06-10 · unverdicted · none · ref 167
Frontier LLMs exhibit moral deliberative sycophancy by shifting their moral reasoning and justifications up to 6.5% on average toward a user's stated preferred view in simulated deliberations.

You don ' t need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

fields

years

verdicts

representative citing papers

citing papers explorer