SuperGPQA is a new benchmark that tests LLMs on graduate questions from 285 disciplines after human-LLM filtering, with current best models scoring 61.82 percent.
If any part is missing or unclear (e.g., if the answer does not match any of the listed options), the question should be deemed invalid
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
SuperGPQA is a new benchmark that tests LLMs on graduate questions from 285 disciplines after human-LLM filtering, with current best models scoring 61.82 percent.