pith:DTXDWKRR
When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression
The attenuation bias from thresholding confidence scores can be predicted exactly from residual score variance on unlabeled data.
arxiv:2605.12780 v1 · 2026-05-12 · stat.ME · cs.LG · stat.ML
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DTXDWKRR22B45FA6FFKAZ224BL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We derive a closed-form expression for the attenuation bias that confidence thresholding induces in the downstream regression coefficient, and show that the bias can be predicted, before any inference is run, from the residual score variance V^*=E[Var(p|X)] on the unlabelled set after partialling out the downstream controls X.
The recent identification result for the underlying moment equation holds exactly, and calibration drift remains bounded; the structural separation X subset W is maintained so that V* is well-defined and observable.
Attenuation bias from confidence thresholding in pseudo-labelled regression equals a closed-form function of residual score variance V* after partialling out controls X, yielding a (V*, κ) safety rule computable before inference.
References
Receipt and verification
| First computed | 2026-05-18T03:09:13.160424Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
1cee3b2a31d683ce941e29540ceb5c0aeda49fe3d3c5b1b2a031a3a28669c31c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1cee3b2a31d683ce941e29540ceb5c0aeda49fe3d3c5b1b2a031a3a28669c31c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "1d557c1f6a42d683e3705d3f35cb3ab17d5461ad73b6b1290bceb02535d5c14b",
"cross_cats_sorted": [
"cs.LG",
"stat.ML"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "stat.ME",
"submitted_at": "2026-05-12T21:49:11Z",
"title_canon_sha256": "df2c18ca52c09ed9500211c50279196e144e1d7e72d11fd380d2b013ea195203"
},
"schema_version": "1.0",
"source": {
"id": "2605.12780",
"kind": "arxiv",
"version": 1
}
}