pith:VJ25UGWN
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
DeepResearch Bench supplies 100 PhD-level tasks across 22 fields plus two evaluation methods that align with human judgment for deep research agents.
arxiv:2506.11763 v1 · 2025-06-13 · cs.CL · cs.IR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VJ25UGWNXNXDEFM2XM7YODMSRU}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We present DeepResearch Bench, a benchmark consisting of 100 PhD-level research tasks... We therefore propose two novel methodologies that achieve strong alignment with human judgment.
The 100 tasks crafted by domain experts across 22 fields are representative of real deep-research challenges and the two proposed evaluation methodologies genuinely align with human judgment without introducing systematic bias or requiring undisclosed tuning.
DeepResearch Bench supplies 100 expert-crafted PhD-level tasks and two human-aligned evaluation frameworks to measure deep research agents on report quality and citation accuracy.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:48.555402Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
aa75da1acdbb6e32159abb3f870d928d33da2195ace29d4b094e865f5e65104b
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VJ25UGWNXNXDEFM2XM7YODMSRU \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: aa75da1acdbb6e32159abb3f870d928d33da2195ace29d4b094e865f5e65104b
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ac435ac616e289a2223f5bfea0c46dd657fc5aa9999a47cf319bbb3cdc7134f9",
"cross_cats_sorted": [
"cs.IR"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-06-13T13:17:32Z",
"title_canon_sha256": "3a96bff25666a3568df2e4fd406d47bb61a953c95d6d9a9afbd2665556103b76"
},
"schema_version": "1.0",
"source": {
"id": "2506.11763",
"kind": "arxiv",
"version": 1
}
}