pith. sign in
Pith Number

pith:ODHAUUJ4

pith:2025:ODHAUUJ4W6KXXLMBS5A5DX6NNE
not attested not anchored not stored refs resolved

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Bingli Wang, Chengdong Lin, Chenghua Zhong, Chenglin Cai, Chengtuo Cheng, Chenqing Wang, Chujie Zheng, Chun Zhang, David Ma, Dayiheng Liu, Ge Zhang, Guoyin Wang, Haoran Que, Hao Wang, Hongquan Lin, Jiaheng Liu, Jiajun Xu, Jian Yang, Jinyang Zhang, Junran Peng, Junting Zhou, Kaijing Ma, Kaixin Deng, Kexin Yang, Keyi Ding, King Zhu, Liang Chen, M-A-P Team, Meng Cao, Minghao Liu, Ming Xu, Min Yang, Qian Liu, Qige Qi, Qinrui Li, Qiyao Wang, Qunshu Lin, Ruibin Yuan, Rui Li, Shanghaoran Quan, Shawn Gavin, Shian Jia, Shi Qiu, Shi Wang, Shiwen Ni, Sichao Jiang, Siming Huang, Sirun Li, Siwei Wu, Tianhao Cheng, Tianhao Liang, Tianyang Pang, Tianyang Zhan, Tianyu Liu, Tianyu Zheng, Tyshawn Hsing, Wangchunshu Zhou, Wenbo Su, Wenhao Huang, Xiang Yue, Xiangyu Zheng, Xiaolong Jin, Xingjian Zhang, Xingwei Qu, Xingyuan Bu, Xinrun Du, Xiyue Zhang, Yang Gao, Yaoru Li, Yifan Chen, Yifan Yao, Yiming Liang, Yinghao Ma, Yiyan Liao, Yiya Wang, Yizhe Li, Yizhi Li, Yizhou Tan, Yongchi Zhao, Yuanhao Yue, Yuansheng Ni, Yubo Wang, Yuelin Bai, Yue Zhang, Yujia Qin, Yun Huang, Yunwen Li, Zekun Moore Wang, Zhaoqun Li, Zhaoxiang Zhang, Zhenlin Wei, Zhenzhu Yang, Zhongyuan Peng, Zhoufutu Wen, Zhoujun Li, Zifan Peng, Zili Wang

SuperGPQA benchmark shows top LLMs reach only 61.82 percent accuracy across 285 graduate disciplines.

arxiv:2502.14739 v4 · 2025-02-20 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ODHAUUJ4W6KXXLMBS5A5DX6NNE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence.

C2weakest assumption

The assumption that the Human-LLM collaborative filtering process produces questions that are genuinely graduate-level, unambiguous, and representative of each discipline without introducing selection bias or over-filtering difficult items.

C3one line summary

SuperGPQA is a new benchmark that tests LLMs on graduate questions from 285 disciplines after human-LLM filtering, with current best models scoring 61.82 percent.

References

121 extracted · 121 resolved · 1 Pith anchors

[1] U-math: A university-level benchmark for evaluating mathematical skills in llms 2024 · doi:10.48550/arxiv.2412.03205
[2] Yi: Open Foundation Models by 01.AI 2024 · doi:10.18653/v1/d18-1259
[3] According to Danto’s definition, context is an art world with modern aspects
[4] “La Bayadère” is a ballet created during the French July Revolution
[5] The ballet “Sylvia” is a dance drama created during the Paris Commune period in 1871

Formal links

1 machine-checked theorem link

Cited by

36 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:49.562806Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

70ce0a513cb7957bad819741d1dfcd6919d880547dbe7becf8ab4e2b15317b7d

Aliases

arxiv: 2502.14739 · arxiv_version: 2502.14739v4 · doi: 10.48550/arxiv.2502.14739 · pith_short_12: ODHAUUJ4W6KX · pith_short_16: ODHAUUJ4W6KXXLMB · pith_short_8: ODHAUUJ4
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ODHAUUJ4W6KXXLMBS5A5DX6NNE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 70ce0a513cb7957bad819741d1dfcd6919d880547dbe7becf8ab4e2b15317b7d
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5e4221a4235efe16896596b5e18106ddc45ba53a8f135b2e1478ccf4344aabd6",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/publicdomain/zero/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-02-20T17:05:58Z",
    "title_canon_sha256": "3d35ab16412f2a2d744cf7d1cdecfbf17234113ad2edc0b1e829f639d42a3ab9"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2502.14739",
    "kind": "arxiv",
    "version": 4
  }
}