pith:J3VKQC7F
CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
CoCoReviewBench evaluates AI reviewers using expert discussions to measure both completeness and correctness.
arxiv:2605.07905 v2 · 2026-05-08 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{J3VKQC7FXT7NASHHFSJCYGAPOA}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
we introduce CoCoReviewBench, which curates 3,900 papers from ICLR and NeurIPS to enable reliable and fine-grained evaluation of AI reviewers. Analysis shows that AI reviewers remain limited in correctness and are prone to hallucinations, and highlights reasoning models as more effective reviewers.
That reviewer-author-meta-review discussions provide reliable expert annotations for correctness and that skipping evaluation when human reviews are missing strengthens completeness without introducing selection bias.
CoCoReviewBench curates 3,900 conference papers with category subsets and expert discussion annotations to evaluate AI reviewers on completeness and correctness, showing they are limited and prone to hallucinations while reasoning models perform better.
Formal links
Receipt and verification
| First computed | 2026-05-20T00:03:15.118293Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
4eeaa80be5bcfed048e72c922c180f701e906c39938c4ac5fb88ca3169349d0c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/J3VKQC7FXT7NASHHFSJCYGAPOA \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4eeaa80be5bcfed048e72c922c180f701e906c39938c4ac5fb88ca3169349d0c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "97759c00471726439f91b0dc9c86529e26d20069c11decbc6d156f8ca015e255",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-05-08T15:44:26Z",
"title_canon_sha256": "c8fdf011241e80dba599a19096803b55e616e75fc5bd4704a2e699b06e7462ec"
},
"schema_version": "1.0",
"source": {
"id": "2605.07905",
"kind": "arxiv",
"version": 2
}
}