pith:Q2XUZPOL
Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling
Multi-level bootstrapping models annotator variance to find the N and K needed for statistically significant evaluations.
arxiv:2605.13801 v1 · 2026-05-13 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Q2XUZPOLM4F32YDX64DCHRK4EG}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
we introduce a multi-level bootstrapping approach to realistically model annotator behavior. Leveraging datasets with a large number of ratings and persistent rater identifiers, we analyze the tradeoffs between the number of items (N) and the number of responses per item (K) required to achieve statistical significance.
That datasets containing large numbers of ratings per item together with persistent rater identifiers are available, representative of typical evaluation settings, and that multi-level bootstrapping accurately captures real annotator variance without introducing new artifacts.
Multi-level bootstrapping models annotator variance using large rater-ID datasets to find optimal tradeoffs between number of items N and ratings per item K for statistically significant AI evaluations.
References
Receipt and verification
| First computed | 2026-05-18T02:44:15.506015Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
86af4cbdcb670bbd6077f70623c55c2190c2155bbf6044378f4e4a4788fa552b
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Q2XUZPOLM4F32YDX64DCHRK4EG \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 86af4cbdcb670bbd6077f70623c55c2190c2155bbf6044378f4e4a4788fa552b
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "164a9bef9af553411a717b73363bbad1d623c68a3be64100a6ded7d8d10528c4",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-13T17:22:27Z",
"title_canon_sha256": "a29bc2911f33f194c25dd30689f81a22a966c777e44598867deef27b86c869f5"
},
"schema_version": "1.0",
"source": {
"id": "2605.13801",
"kind": "arxiv",
"version": 1
}
}