pith. sign in
Pith Number

pith:DTXDWKRR

pith:2026:DTXDWKRR22B45FA6FFKAZ224BL
not attested not anchored not stored refs resolved

When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression

Marcell T. Kurbucz

The attenuation bias from thresholding confidence scores can be predicted exactly from residual score variance on unlabeled data.

arxiv:2605.12780 v1 · 2026-05-12 · stat.ME · cs.LG · stat.ML

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DTXDWKRR22B45FA6FFKAZ224BL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We derive a closed-form expression for the attenuation bias that confidence thresholding induces in the downstream regression coefficient, and show that the bias can be predicted, before any inference is run, from the residual score variance V^*=E[Var(p|X)] on the unlabelled set after partialling out the downstream controls X.

C2weakest assumption

The recent identification result for the underlying moment equation holds exactly, and calibration drift remains bounded; the structural separation X subset W is maintained so that V* is well-defined and observable.

C3one line summary

Attenuation bias from confidence thresholding in pseudo-labelled regression equals a closed-form function of residual score variance V* after partialling out controls X, yielding a (V*, κ) safety rule computable before inference.

References

29 extracted · 29 resolved · 1 Pith anchors

[1] N. Kallus, X. Mao, A. Zhou, Assessing algorithmic fairness with un- observed protected class using data combination, Management Science 68 (3) (2022) 1959–1981 2022
[2] Lee, Pseudo-label: The simple and efficient semi-supervised learn- ing method for deep neural networks, in: Workshop on challenges in representation learning, ICML, Vol 2013
[3] K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A.Kurakin, H.Zhang, C.Raffel, FixMatch: Simplifyingsemi-supervised learning with consistency and confidence, in: Advances in Neural 2020
[4] B. Zhang, Y. Wang, W. Hou, H. Wu, J. Wang, M. Okumura, T. Shi- nozaki, FlexMatch: Boosting semi-supervised learning with curriculum pseudo labeling, Advances in neural information processing systems 3 2021
[5] Y. Wang, H. Chen, Q. Heng, W. Hou, Y. Fan, Z. Wu, J. Wang, M. Sav- vides, T. Shinozaki, B. Raj, B. Schiele, X. Xie, FreeMatch: Self-adaptive thresholding for semi-supervised learning, International Co 2023
Receipt and verification
First computed 2026-05-18T03:09:13.160424Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1cee3b2a31d683ce941e29540ceb5c0aeda49fe3d3c5b1b2a031a3a28669c31c

Aliases

arxiv: 2605.12780 · arxiv_version: 2605.12780v1 · doi: 10.48550/arxiv.2605.12780 · pith_short_12: DTXDWKRR22B4 · pith_short_16: DTXDWKRR22B45FA6 · pith_short_8: DTXDWKRR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1cee3b2a31d683ce941e29540ceb5c0aeda49fe3d3c5b1b2a031a3a28669c31c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1d557c1f6a42d683e3705d3f35cb3ab17d5461ad73b6b1290bceb02535d5c14b",
    "cross_cats_sorted": [
      "cs.LG",
      "stat.ML"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "stat.ME",
    "submitted_at": "2026-05-12T21:49:11Z",
    "title_canon_sha256": "df2c18ca52c09ed9500211c50279196e144e1d7e72d11fd380d2b013ea195203"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12780",
    "kind": "arxiv",
    "version": 1
  }
}