pith:ODHAUUJ4
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
SuperGPQA benchmark shows top LLMs reach only 61.82 percent accuracy across 285 graduate disciplines.
arxiv:2502.14739 v4 · 2025-02-20 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ODHAUUJ4W6KXXLMBS5A5DX6NNE}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence.
The assumption that the Human-LLM collaborative filtering process produces questions that are genuinely graduate-level, unambiguous, and representative of each discipline without introducing selection bias or over-filtering difficult items.
SuperGPQA is a new benchmark that tests LLMs on graduate questions from 285 disciplines after human-LLM filtering, with current best models scoring 61.82 percent.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:49.562806Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
70ce0a513cb7957bad819741d1dfcd6919d880547dbe7becf8ab4e2b15317b7d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ODHAUUJ4W6KXXLMBS5A5DX6NNE \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 70ce0a513cb7957bad819741d1dfcd6919d880547dbe7becf8ab4e2b15317b7d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "5e4221a4235efe16896596b5e18106ddc45ba53a8f135b2e1478ccf4344aabd6",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-02-20T17:05:58Z",
"title_canon_sha256": "3d35ab16412f2a2d744cf7d1cdecfbf17234113ad2edc0b1e829f639d42a3ab9"
},
"schema_version": "1.0",
"source": {
"id": "2502.14739",
"kind": "arxiv",
"version": 4
}
}