pith:5CF2FRIB
Distribution Shift in Missing Data Imputation: A Risk-Based Perspective and Importance-Weighted Correction under MAR
Standard imputation fails to minimize full-data error under MAR because observed training data differs in distribution from the target.
arxiv:2602.06713 v2 · 2026-02-06 · stat.ML · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{5CF2FRIB7XEHHQQBCH4BJQVLJD}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We propose a novel imputation algorithm designed to learn an imputation model from the observed data while explicitly accounting for this distribution shift. Simulation studies show consistent improvements over otherwise identical uncorrected baselines, with average reductions of 3% in RMSE and 7% in Wasserstein distance.
The missingness mechanism satisfies MAR so that missingness probabilities can be estimated from observed data alone and used to form reliable importance weights without introducing additional bias.
Standard imputation methods fail to minimize full-data MSE under MAR due to distribution shift; a new importance-weighted algorithm corrects for it and improves RMSE by 3% and Wasserstein distance by 7% in simulations.
Formal links
Receipt and verification
| First computed | 2026-05-18T02:45:05.373339Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e88ba2c501fdc873c20111f814c2ab48f262c39038db48000c9e16c1097bdf1e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/5CF2FRIB7XEHHQQBCH4BJQVLJD \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e88ba2c501fdc873c20111f814c2ab48f262c39038db48000c9e16c1097bdf1e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "154ba1dec97baf864cb26a2c7b78f7007eafd1ae3ba81abd47c4ebadabcf3e3b",
"cross_cats_sorted": [
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "stat.ML",
"submitted_at": "2026-02-06T14:02:12Z",
"title_canon_sha256": "fb588127e8260d5200ea0ca53675ea64cfe3ea06cf46aea5de227913b21a62dd"
},
"schema_version": "1.0",
"source": {
"id": "2602.06713",
"kind": "arxiv",
"version": 2
}
}