pith. sign in
Pith Number

pith:YXRCLVYE

pith:2026:YXRCLVYEFMT2ZYMYSC4INW4OWL
not attested not anchored not stored refs pending

Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol

Hongmin Li

An audit-constrained protocol identifies genuine LLM reasoning errors from valid prompt variants while excluding artifacts, yet adaptive sampling yields no advantage over uniform sampling.

arxiv:2605.11599 v2 · 2026-05-12 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YXRCLVYEFMT2ZYMYSC4INW4OWL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across three audited slices, the protocol identifies confirmed model-error prompt keys while excluding formatting and extraction artifacts, but matched comparisons do not show that CAPS improves audited yield or unique prompt-key discovery over uniform sampling.

C2weakest assumption

That the semantic and extraction audit procedure reliably and consistently distinguishes genuine model reasoning errors from invalid perturbations, extraction artifacts, and unmatched search procedures without introducing its own biases or omissions.

C3one line summary

An audit-constrained protocol for LLM reasoning tests finds that component-adaptive prompt sampling yields no improvement over uniform sampling in identifying confirmed model errors after semantic and extraction audits.

Receipt and verification
First computed 2026-05-20T00:03:17.913442Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c5e225d7042b27ace19890b886db8eb2d7b623f3986c589de0d5b6d00a167b63

Aliases

arxiv: 2605.11599 · arxiv_version: 2605.11599v2 · doi: 10.48550/arxiv.2605.11599 · pith_short_12: YXRCLVYEFMT2 · pith_short_16: YXRCLVYEFMT2ZYMY · pith_short_8: YXRCLVYE
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YXRCLVYEFMT2ZYMYSC4INW4OWL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c5e225d7042b27ace19890b886db8eb2d7b623f3986c589de0d5b6d00a167b63
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "63fd56cab966ff5f00f4b8321f89c673d54e08abb75401aa9e0537b7e965603c",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-12T06:26:22Z",
    "title_canon_sha256": "0fbb8e1eeb3329a7cf2eac350bd850b477d372102c5210d245d6cdec773163c0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.11599",
    "kind": "arxiv",
    "version": 2
  }
}