pith:Y4V6HMAC
What properties of reasoning supervision are associated with improved downstream model quality?
Intrinsic metrics on reasoning data strongly predict downstream model performance in a scale-dependent way.
arxiv:2605.13290 v1 · 2026-05-13 · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y4V6HMACAJD67ZG4JGLUWEGRKY}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our analysis reveals that these intrinsic metrics demonstrate strong and significant correlations with downstream model performance. Crucially, we find that the predictors of utility are scale-dependent.
That the semantically distinct variants of a single Polish reasoning dataset are representative enough for the observed scale-dependent patterns to generalize to other languages, domains, and model families.
Intrinsic data metrics predict reasoning dataset utility for model fine-tuning, with different predictors working best for smaller versus larger models.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T02:44:49.126010Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c72be3b0020247efe4dc49974b10d1563dc8e33e016149d2e6e10327808b9913
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y4V6HMACAJD67ZG4JGLUWEGRKY \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c72be3b0020247efe4dc49974b10d1563dc8e33e016149d2e6e10327808b9913
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "066e71923d82a2f4dd5e76eb6faedb1a5f9f66947a96db1434281bc01d53e406",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.AI",
"submitted_at": "2026-05-13T10:04:38Z",
"title_canon_sha256": "953e1d82724d775a6928ad2fe96e76f53fc9a7250e47331bcf88840cc3f13822"
},
"schema_version": "1.0",
"source": {
"id": "2605.13290",
"kind": "arxiv",
"version": 1
}
}