pith. sign in
Pith Number

pith:J3VKQC7F

pith:2026:J3VKQC7FXT7NASHHFSJCYGAPOA
not attested not anchored not stored refs pending

CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers

Dehao Huang, Derek F. Wong, Hexuan Deng, Min Zhang, Ruina Hu, Xiaopeng Ke, Xuebo Liu, Yichen Li, Yue Wang

CoCoReviewBench evaluates AI reviewers using expert discussions to measure both completeness and correctness.

arxiv:2605.07905 v2 · 2026-05-08 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{J3VKQC7FXT7NASHHFSJCYGAPOA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we introduce CoCoReviewBench, which curates 3,900 papers from ICLR and NeurIPS to enable reliable and fine-grained evaluation of AI reviewers. Analysis shows that AI reviewers remain limited in correctness and are prone to hallucinations, and highlights reasoning models as more effective reviewers.

C2weakest assumption

That reviewer-author-meta-review discussions provide reliable expert annotations for correctness and that skipping evaluation when human reviews are missing strengthens completeness without introducing selection bias.

C3one line summary

CoCoReviewBench curates 3,900 conference papers with category subsets and expert discussion annotations to evaluate AI reviewers on completeness and correctness, showing they are limited and prone to hallucinations while reasoning models perform better.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:03:15.118293Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4eeaa80be5bcfed048e72c922c180f701e906c39938c4ac5fb88ca3169349d0c

Aliases

arxiv: 2605.07905 · arxiv_version: 2605.07905v2 · doi: 10.48550/arxiv.2605.07905 · pith_short_12: J3VKQC7FXT7N · pith_short_16: J3VKQC7FXT7NASHH · pith_short_8: J3VKQC7F
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/J3VKQC7FXT7NASHHFSJCYGAPOA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4eeaa80be5bcfed048e72c922c180f701e906c39938c4ac5fb88ca3169349d0c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "97759c00471726439f91b0dc9c86529e26d20069c11decbc6d156f8ca015e255",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-08T15:44:26Z",
    "title_canon_sha256": "c8fdf011241e80dba599a19096803b55e616e75fc5bd4704a2e699b06e7462ec"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.07905",
    "kind": "arxiv",
    "version": 2
  }
}