Deep research agents exhibit widespread search-time contamination on six public benchmarks, with three defined leakage types inflating performance by up to 4%.
arXiv preprint arXiv:2602.01590 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Search-Time Contamination in Deep Research Agents: Measuring Performance Inflation in Public Benchmark Evaluation
Deep research agents exhibit widespread search-time contamination on six public benchmarks, with three defined leakage types inflating performance by up to 4%.