pith. sign in
Pith Number

pith:TFPELXQJ

pith:2026:TFPELXQJV7S76OZVXU2BLQVJWB
not attested not anchored not stored refs resolved

Validated Hypotheses as a Lens for Human-Likeness Evaluation in AI Agents

Guankai Zhai, Haojian Jin, Haoyang Shang, Xuan Liu, Yiwen Tu, Yuanjun Feng, Yunze Xiao, Zizhang Liu

If AI agents are human-like, populations of them should reach the same conclusions as humans on established social science experiments.

arxiv:2605.15473 v1 · 2026-05-14 · cs.CY

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TFPELXQJV7S76OZVXU2BLQVJWB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

If an agent is human-like, a population of such agents should reach the same inferential conclusion as the human population when run through the same experiment.

C2weakest assumption

That the curated set of validated social science hypotheses and their experimental protocols can be faithfully turned into simulation environments that isolate the relevant behavioral signals without introducing artifacts from the agent implementation.

C3one line summary

Introduces HumanStudy-Bench to evaluate LLM agents against 12 replicated human behavioral studies, finding agent design affects alignment more than model scale with polarized outcomes.

References

22 extracted · 22 resolved · 0 Pith anchors

[1] P ersona LLM : Investigating the ability of large language models to express personality traits 2024 · doi:10.18653/v1/2024.findings-naacl.229
[2] The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for Large Language Models 2026 · doi:10.18653/v1/2025.findings-emnlp.1261
[3] For a binary state θ, the distribution maximizing Shannon Entropy H(θ) is the uniform distribution, P(θ= 1) =P(θ= 0) = 0.5 2003
[4] Minimizing Bayes Risk.We define the loss function as the Squared Error Loss with respect to the true alignment A∗: L( ˆA, A∗) = ( ˆA −A ∗)2. The optimal estimator that minimizes the Bayes Risk (Expect
[5] Perspective II: The Frequentist View (Variance Reduction).In the Frequentist ontology, the latent truth states θh, θa ∈ {0,1} are fixed unknown constants

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:01:00.409258Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

995e45de09afe5ff3b35bd3415c2a9b07622b776e03d000a55427db86371c43c

Aliases

arxiv: 2605.15473 · arxiv_version: 2605.15473v1 · doi: 10.48550/arxiv.2605.15473 · pith_short_12: TFPELXQJV7S7 · pith_short_16: TFPELXQJV7S76OZV · pith_short_8: TFPELXQJ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TFPELXQJV7S76OZVXU2BLQVJWB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 995e45de09afe5ff3b35bd3415c2a9b07622b776e03d000a55427db86371c43c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "e1329aa40f709c70993e24f7f15b84baec69861632666930476cc27656d1f143",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CY",
    "submitted_at": "2026-05-14T23:25:02Z",
    "title_canon_sha256": "95015454615e0312c83eff4559a37d63f712a82f5058587951e0584b8532d647"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15473",
    "kind": "arxiv",
    "version": 1
  }
}