SeekerGym is a new benchmark that measures how completely AI agents retrieve information from full documents and how well they quantify uncertainty about missing parts, with top methods achieving only 42.5% recall on Wikipedia and 29.2% on ML surveys.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SeekerGym: A Benchmark for Reliable Information Seeking
SeekerGym is a new benchmark that measures how completely AI agents retrieve information from full documents and how well they quantify uncertainty about missing parts, with top methods achieving only 42.5% recall on Wikipedia and 29.2% on ML surveys.