Introduces HumanStudy-Bench to evaluate LLM agents against 12 replicated human behavioral studies, finding agent design affects alignment more than model scale with polarized outcomes.
documentation_complete
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CY 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Validated Hypotheses as a Lens for Human-Likeness Evaluation in AI Agents
Introduces HumanStudy-Bench to evaluate LLM agents against 12 replicated human behavioral studies, finding agent design affects alignment more than model scale with polarized outcomes.