Largelanguagemodelpsychomet- rics: A systematic review of evaluation, validation, and enhancement.CoRR, abs/2505.08245

Haoran Ye, Jing Jin, Yuhang Xie, Xin Zhang, Guojie Song · 2025 · arXiv 2505.08245

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

cs.AI · 2026-05-11 · unverdicted · novelty 8.0 · 2 refs

Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.

The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

The primary axis of psychometric variation among LLMs is the degree to which they represent themselves as loci of phenomenal experience rather than systems of behavioral responses.

Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report

cs.CL · 2026-04-20 · conditional · novelty 7.0

Validity indices adapted from clinical assessment classify four frontier LLMs as construct-level invalid on metacognitive probes, with valid models showing positive item-sensitive confidence (r=.18) while invalid ones show the opposite (r=-.20).

Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior

cs.CL · 2025-09-12 · unverdicted · novelty 5.0

Questionnaire-based and generation-based psychological profiles for LLMs are substantially different, indicating that established human questionnaires reflect desired behavior instead of stable psychological constructs.

citing papers explorer

Showing 4 of 4 citing papers.

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values cs.AI · 2026-05-11 · unverdicted · none · ref 46 · 2 links
Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences cs.CL · 2026-05-06 · unverdicted · none · ref 41
The primary axis of psychometric variation among LLMs is the degree to which they represent themselves as loci of phenomenal experience rather than systems of behavioral responses.
Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report cs.CL · 2026-04-20 · conditional · none · ref 9
Validity indices adapted from clinical assessment classify four frontier LLMs as construct-level invalid on metacognitive probes, with valid models showing positive item-sensitive confidence (r=.18) while invalid ones show the opposite (r=-.20).
Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior cs.CL · 2025-09-12 · unverdicted · none · ref 39
Questionnaire-based and generation-based psychological profiles for LLMs are substantially different, indicating that established human questionnaires reflect desired behavior instead of stable psychological constructs.

Largelanguagemodelpsychomet- rics: A systematic review of evaluation, validation, and enhancement.CoRR, abs/2505.08245

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer