{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:DTXDWKRR22B45FA6FFKAZ224BL","short_pith_number":"pith:DTXDWKRR","schema_version":"1.0","canonical_sha256":"1cee3b2a31d683ce941e29540ceb5c0aeda49fe3d3c5b1b2a031a3a28669c31c","source":{"kind":"arxiv","id":"2605.12780","version":1},"attestation_state":"computed","paper":{"title":"When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression","license":"http://creativecommons.org/licenses/by/4.0/","headline":"The attenuation bias from thresholding confidence scores can be predicted exactly from residual score variance on unlabeled data.","cross_cats":["cs.LG","stat.ML"],"primary_cat":"stat.ME","authors_text":"Marcell T. Kurbucz","submitted_at":"2026-05-12T21:49:11Z","abstract_excerpt":"Calibrated probability outputs of trained classifiers are increasingly used as inputs to downstream regression estimands such as effects, prevalences, or disparities for a latent group observed only on a small labelled subset. A standard practice is to threshold the calibrated score at a confidence cutoff and treat the hard label as the truth. Building on a recent identification result for the underlying moment equation, we develop a calibration-aware diagnostic apparatus for pseudo-labelling pipelines. We derive a closed-form expression for the attenuation bias that confidence thresholding in"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.12780","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"stat.ME","submitted_at":"2026-05-12T21:49:11Z","cross_cats_sorted":["cs.LG","stat.ML"],"title_canon_sha256":"df2c18ca52c09ed9500211c50279196e144e1d7e72d11fd380d2b013ea195203","abstract_canon_sha256":"1d557c1f6a42d683e3705d3f35cb3ab17d5461ad73b6b1290bceb02535d5c14b"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T03:09:13.161065Z","signature_b64":"meybZs8QYD6YO8MZ2+2h+elCTm2+oR2jAv+irRpKceTHmfSR6ysROIJM3iYZzQbL5h6lP0a+E/p5/dMHp4wAAA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"1cee3b2a31d683ce941e29540ceb5c0aeda49fe3d3c5b1b2a031a3a28669c31c","last_reissued_at":"2026-05-18T03:09:13.160424Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T03:09:13.160424Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression","license":"http://creativecommons.org/licenses/by/4.0/","headline":"The attenuation bias from thresholding confidence scores can be predicted exactly from residual score variance on unlabeled data.","cross_cats":["cs.LG","stat.ML"],"primary_cat":"stat.ME","authors_text":"Marcell T. Kurbucz","submitted_at":"2026-05-12T21:49:11Z","abstract_excerpt":"Calibrated probability outputs of trained classifiers are increasingly used as inputs to downstream regression estimands such as effects, prevalences, or disparities for a latent group observed only on a small labelled subset. A standard practice is to threshold the calibrated score at a confidence cutoff and treat the hard label as the truth. Building on a recent identification result for the underlying moment equation, we develop a calibration-aware diagnostic apparatus for pseudo-labelling pipelines. We derive a closed-form expression for the attenuation bias that confidence thresholding in"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We derive a closed-form expression for the attenuation bias that confidence thresholding induces in the downstream regression coefficient, and show that the bias can be predicted, before any inference is run, from the residual score variance V^*=E[Var(p|X)] on the unlabelled set after partialling out the downstream controls X.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The recent identification result for the underlying moment equation holds exactly, and calibration drift remains bounded; the structural separation X subset W is maintained so that V* is well-defined and observable.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Attenuation bias from confidence thresholding in pseudo-labelled regression equals a closed-form function of residual score variance V* after partialling out controls X, yielding a (V*, κ) safety rule computable before inference.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"The attenuation bias from thresholding confidence scores can be predicted exactly from residual score variance on unlabeled data.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"9c42d5e3032145905cde071ff1213dd8f718571f9f73c6b1c1272526c23e19c8"},"source":{"id":"2605.12780","kind":"arxiv","version":1},"verdict":{"id":"f444f652-6915-47e6-a8c0-81fcc5e33cb5","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:37:52.002207Z","strongest_claim":"We derive a closed-form expression for the attenuation bias that confidence thresholding induces in the downstream regression coefficient, and show that the bias can be predicted, before any inference is run, from the residual score variance V^*=E[Var(p|X)] on the unlabelled set after partialling out the downstream controls X.","one_line_summary":"Attenuation bias from confidence thresholding in pseudo-labelled regression equals a closed-form function of residual score variance V* after partialling out controls X, yielding a (V*, κ) safety rule computable before inference.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The recent identification result for the underlying moment equation holds exactly, and calibration drift remains bounded; the structural separation X subset W is maintained so that V* is well-defined and observable.","pith_extraction_headline":"The attenuation bias from thresholding confidence scores can be predicted exactly from residual score variance on unlabeled data."},"references":{"count":29,"sample":[{"doi":"","year":2022,"title":"N. Kallus, X. Mao, A. Zhou, Assessing algorithmic fairness with un- observed protected class using data combination, Management Science 68 (3) (2022) 1959–1981","work_id":"9ad34ff2-8f76-4a83-9c5e-98afdded720f","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2013,"title":"Lee, Pseudo-label: The simple and efficient semi-supervised learn- ing method for deep neural networks, in: Workshop on challenges in representation learning, ICML, Vol","work_id":"62970d96-ca1c-4271-b5fe-f33a4d80bf4b","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A.Kurakin, H.Zhang, C.Raffel, FixMatch: Simplifyingsemi-supervised learning with consistency and confidence, in: Advances in Neural ","work_id":"24890458-18e4-407a-8dd4-79f3be0d1ebf","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"B. Zhang, Y. Wang, W. Hou, H. Wu, J. Wang, M. Okumura, T. Shi- nozaki, FlexMatch: Boosting semi-supervised learning with curriculum pseudo labeling, Advances in neural information processing systems 3","work_id":"64abac3b-d40f-430e-83f3-a35a560ccf70","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Y. Wang, H. Chen, Q. Heng, W. Hou, Y. Fan, Z. Wu, J. Wang, M. Sav- vides, T. Shinozaki, B. Raj, B. Schiele, X. Xie, FreeMatch: Self-adaptive thresholding for semi-supervised learning, International Co","work_id":"2f4d3fa4-f8ed-4aaf-a979-93491cad41e4","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":29,"snapshot_sha256":"b2c0abaf5b0dce5015e8bf7966893fa85616bbb5efba78f23e2000376f2080f7","internal_anchors":1},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.12780","created_at":"2026-05-18T03:09:13.160522+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.12780v1","created_at":"2026-05-18T03:09:13.160522+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.12780","created_at":"2026-05-18T03:09:13.160522+00:00"},{"alias_kind":"pith_short_12","alias_value":"DTXDWKRR22B4","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"DTXDWKRR22B45FA6","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"DTXDWKRR","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL","json":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL.json","graph_json":"https://pith.science/api/pith-number/DTXDWKRR22B45FA6FFKAZ224BL/graph.json","events_json":"https://pith.science/api/pith-number/DTXDWKRR22B45FA6FFKAZ224BL/events.json","paper":"https://pith.science/paper/DTXDWKRR"},"agent_actions":{"view_html":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL","download_json":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL.json","view_paper":"https://pith.science/paper/DTXDWKRR","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.12780&json=true","fetch_graph":"https://pith.science/api/pith-number/DTXDWKRR22B45FA6FFKAZ224BL/graph.json","fetch_events":"https://pith.science/api/pith-number/DTXDWKRR22B45FA6FFKAZ224BL/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL/action/timestamp_anchor","attest_storage":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL/action/storage_attestation","attest_author":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL/action/author_attestation","sign_citation":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL/action/citation_signature","submit_replication":"https://pith.science/pith/DTXDWKRR22B45FA6FFKAZ224BL/action/replication_record"}},"created_at":"2026-05-18T03:09:13.160522+00:00","updated_at":"2026-05-18T03:09:13.160522+00:00"}