If you haven’t settled on an answer choice in that time, we encourage you to spend as long as you need to be confident in your selection

We ask that you spendat least 15 minutestrying to answer each question before making your selection

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

cs.AI · 2023-11-20 · accept · novelty 7.0

GPQA is a new graduate-level benchmark where PhD experts score 65% (74% after corrections), skilled non-experts score 34% with web access, and GPT-4 scores 39%, intended to enable realistic tests of human supervision over superhuman AI.

citing papers explorer

Showing 1 of 1 citing paper.

GPQA: A Graduate-Level Google-Proof Q&A Benchmark cs.AI · 2023-11-20 · accept · none · ref 44
GPQA is a new graduate-level benchmark where PhD experts score 65% (74% after corrections), skilled non-experts score 34% with web access, and GPT-4 scores 39%, intended to enable realistic tests of human supervision over superhuman AI.

If you haven’t settled on an answer choice in that time, we encourage you to spend as long as you need to be confident in your selection

fields

years

verdicts

representative citing papers

citing papers explorer