pith:YHPZV5IX
Measuring short-form factuality in large language models
SimpleQA benchmark measures if language models know what they know on short facts.
arxiv:2411.04368 v1 · 2024-11-07 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YHPZV5IXRF3GIHEWDFTPQWKAAB}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
SimpleQA is a simple, targeted evaluation for whether models 'know what they know,' and our hope is that this benchmark will remain relevant for the next few generations of frontier models.
Questions can be created such that there exists only a single, indisputable answer and that adversarial collection against GPT-4 responses produces questions that remain challenging for future models.
SimpleQA is a new benchmark of short, single-answer factual questions collected adversarially against GPT-4 to evaluate LLM factuality and confidence calibration.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:53.221012Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c1df9af5178976641c961966f8594000496af92ff9b0066c7067ad8d484a8e51
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YHPZV5IXRF3GIHEWDFTPQWKAAB \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c1df9af5178976641c961966f8594000496af92ff9b0066c7067ad8d484a8e51
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "bc6590af27d293a121a6fce13fd12d46a8a316c7aaf3e54b2a59f21071aca0f6",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2024-11-07T01:58:42Z",
"title_canon_sha256": "fd6601c4ac8b2d44a8b49a1794e90a34cc658b1e7eb5e0afc3ec76cf8436e8e7"
},
"schema_version": "1.0",
"source": {
"id": "2411.04368",
"kind": "arxiv",
"version": 1
}
}