pith:YXRCLVYE
Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol
An audit-constrained protocol identifies genuine LLM reasoning errors from valid prompt variants while excluding artifacts, yet adaptive sampling yields no advantage over uniform sampling.
arxiv:2605.11599 v2 · 2026-05-12 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YXRCLVYEFMT2ZYMYSC4INW4OWL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Across three audited slices, the protocol identifies confirmed model-error prompt keys while excluding formatting and extraction artifacts, but matched comparisons do not show that CAPS improves audited yield or unique prompt-key discovery over uniform sampling.
That the semantic and extraction audit procedure reliably and consistently distinguishes genuine model reasoning errors from invalid perturbations, extraction artifacts, and unmatched search procedures without introducing its own biases or omissions.
An audit-constrained protocol for LLM reasoning tests finds that component-adaptive prompt sampling yields no improvement over uniform sampling in identifying confirmed model errors after semantic and extraction audits.
Receipt and verification
| First computed | 2026-05-20T00:03:17.913442Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c5e225d7042b27ace19890b886db8eb2d7b623f3986c589de0d5b6d00a167b63
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YXRCLVYEFMT2ZYMYSC4INW4OWL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c5e225d7042b27ace19890b886db8eb2d7b623f3986c589de0d5b6d00a167b63
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "63fd56cab966ff5f00f4b8321f89c673d54e08abb75401aa9e0537b7e965603c",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-12T06:26:22Z",
"title_canon_sha256": "0fbb8e1eeb3329a7cf2eac350bd850b477d372102c5210d245d6cdec773163c0"
},
"schema_version": "1.0",
"source": {
"id": "2605.11599",
"kind": "arxiv",
"version": 2
}
}