GPQA is a new graduate-level benchmark where PhD experts score 65% (74% after corrections), skilled non-experts score 34% with web access, and GPT-4 scores 39%, intended to enable realistic tests of human supervision over superhuman AI.
If you haven’t settled on an answer choice in that time, we encourage you to spend as long as you need to be confident in your selection
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2023 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA is a new graduate-level benchmark where PhD experts score 65% (74% after corrections), skilled non-experts score 34% with web access, and GPT-4 scores 39%, intended to enable realistic tests of human supervision over superhuman AI.