pith:Y6C2V5CY
Permutation-Consensus Listwise Judging for Robust Factuality Evaluation
Rerunning listwise factuality prompts over multiple candidate orderings and aggregating the results produces up to 7-point gains over direct judging.
arxiv:2603.20562 v3 · 2026-03-20 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y6C2V5CYAVWDDJNFS7DHQZLSMN}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
On RewardBench 2 Factuality, PCFJudge improves over direct judging by up to 7 absolute points.
That order sensitivity is a dominant, fixable source of error in listwise factuality judging and that simple aggregation over permutations does not introduce compensating biases or reduce discriminative power.
PCFJudge improves direct LLM factuality judging by up to 7 points on RewardBench 2 Factuality by aggregating results over multiple candidate permutations.
Formal links
Receipt and verification
| First computed | 2026-05-20T00:02:10.923891Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c785aaf458056c31a5a597c6786572636398b664b0be27ddace31cb301feb61a
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y6C2V5CYAVWDDJNFS7DHQZLSMN \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c785aaf458056c31a5a597c6786572636398b664b0be27ddace31cb301feb61a
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "6c6d8cdc9a91a96e54dd7e7019e29a6d11d9294c1c8bc21410f8d5a52ad293fe",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by-sa/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-03-20T23:35:14Z",
"title_canon_sha256": "8d8d32cb18b503d168f265f111f7598b03d48c127137e03d8956c95ae4504a5f"
},
"schema_version": "1.0",
"source": {
"id": "2603.20562",
"kind": "arxiv",
"version": 3
}
}