Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap
Pith reviewed 2026-05-22 16:58 UTC · model grok-4.3
The pith
Augmented inverse probability weighting estimators maintain double robustness and semiparametric efficiency for general targets in semi-supervised learning under vanishing overlap and distribution shift.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce the D2S3 framework for estimation, inference, and efficiency benchmarking in semi-supervised learning under missing-at-random labels and vanishing overlap. Augmented inverse probability weighting estimators are shown to preserve double robustness, asymptotic normality, and semiparametric efficiency even with high-dimensional nuisance estimation and distributional shift. Classical root-n rates no longer hold; instead, corrected asymptotic rates are derived that explicitly incorporate the decay in overlap probability.
What carries the argument
Augmented inverse probability weighting estimators corrected for the decay rate of the labeling probability, which carries double robustness and efficiency through the regime of vanishing overlap.
If this is right
- Double robustness holds for smooth targets such as means, linear coefficients, quantiles, and causal effects.
- Asymptotic normality is retained with rates adjusted downward from root-n according to the speed of overlap decay.
- Semiparametric efficiency is achieved under high-dimensional nuisance functions and distributional shift.
- The framework applies directly to settings with scarce labeled data such as IoT and public health studies.
Where Pith is reading between the lines
- Accurate estimation of the decay rate itself becomes a key practical requirement for reliable inference.
- The same rate-adjustment logic may extend to other estimators besides augmented inverse probability weighting.
- Applications in limited-experiment causal inference could benefit from treating the labeled sample as the experimental arm with decaying selection probability.
Load-bearing premise
The rate at which overlap between labeled and unlabeled samples decays is known or can be estimated at a rate fast enough for the corrected asymptotic results to apply.
What would settle it
A simulation with known overlap decay faster than the modeled rate in which the estimator's distribution deviates from the predicted corrected asymptotic normality or efficiency bound.
Figures
read the original abstract
In modern scientific applications, large volumes of covariate data are readily available, while outcome labels are costly, sparse, and often subject to distribution shift. This asymmetry has spurred interest in semi-supervised (SS) learning, but most existing approaches rely on strong assumptions -- such as missing completely at random (MCAR) labeling or strict positivity -- that put substantial limitations on their practical usefulness. In this work, we introduce a general semiparametric framework for estimation, inference, and efficiency benchmarking in SS settings where labels are missing at random (MAR) and the overlap may vanish as sample size increases. Our framework, that we label D2S3, accommodates a wide range of smooth statistical targets -- including means, linear regression coefficients, quantiles, and causal effects -- and remains valid under high-dimensional nuisance estimation and distributional shift between labeled and unlabeled samples. We extend the theoretical guarantees of augmented inverse probability weighting estimators to preserve double robustness, asymptotic normality, and semiparametric efficiency under this challenging D2S3 regime. A key insight is that classical root-n convergence fails under vanishing overlap; we instead provide corrected asymptotic rates that capture the impact of the decay in overlap. We validate our theory through simulations and demonstrate practical utility in real-world applications on the internet of things and public health where labeled data are scarce.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the D2S3 framework for semiparametric semi-supervised estimation under MAR labeling, vanishing overlap, and distribution shift between labeled and unlabeled samples. It extends augmented inverse probability weighting (AIPW) estimators to general smooth targets (means, linear regression coefficients, quantiles, causal effects) while preserving double robustness, deriving corrected asymptotic rates to restore asymptotic normality and semiparametric efficiency when classical root-n convergence fails due to decaying overlap, and accommodating high-dimensional nuisance estimation.
Significance. If the corrected rates and associated conditions are rigorously derived and the overlap decay is either known or estimable at a sufficient rate, the work would meaningfully extend classical AIPW theory to more realistic semi-supervised regimes with vanishing positivity and distribution shift. The generality across targets and the explicit handling of non-root-n rates are potential strengths for enabling valid inference where standard semiparametric results do not apply.
major comments (2)
- [Abstract] Abstract: The central claim that AIPW estimators preserve asymptotic normality and semiparametric efficiency under the D2S3 regime via 'corrected asymptotic rates' that depend on the overlap decay sequence is load-bearing for all downstream guarantees, yet the abstract provides no explicit form for these rates, no conditions on the decay (e.g., deterministic vs. random, monotonicity, or relative to sample size n), and no nuisance rate requirements that incorporate the decay. If the true decay is faster or irregular, the bias-variance balance in the expansion would fail, invalidating normality and efficiency.
- [Theoretical results] Theoretical development: The extension of double robustness requires that the overlap decay rate be known a priori or estimated without disturbing the asymptotic expansion; the manuscript must state the precise conditions under which the post-hoc adjustments to the AIPW expansion remain valid, including how estimation error in the propensity or overlap interacts with the vanishing overlap term.
minor comments (2)
- [Abstract] The acronym D2S3 is used without spelling out 'Distribution shift, Decaying overlap, Semi-Supervised' on first use in the abstract.
- [Simulations] Simulation section: Provide more detail on how the overlap decay sequence is generated and whether it is treated as known or estimated in the reported experiments.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The comments highlight important aspects of clarity in presenting the D2S3 framework's rates and conditions. We address each major comment below and will revise the manuscript to strengthen the exposition while preserving the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that AIPW estimators preserve asymptotic normality and semiparametric efficiency under the D2S3 regime via 'corrected asymptotic rates' that depend on the overlap decay sequence is load-bearing for all downstream guarantees, yet the abstract provides no explicit form for these rates, no conditions on the decay (e.g., deterministic vs. random, monotonicity, or relative to sample size n), and no nuisance rate requirements that incorporate the decay. If the true decay is faster or irregular, the bias-variance balance in the expansion would fail, invalidating normality and efficiency.
Authors: We agree that the abstract would benefit from greater explicitness on this central claim. The corrected asymptotic rate is derived in Theorem 1 as n^{-1/2} pi_n^{-1/2} (where pi_n denotes the overlap decay sequence), under the assumption that pi_n is deterministic, non-increasing, and known up to estimation at a rate that does not dominate the leading term. Nuisance estimation must satisfy ||hat{eta} - eta|| = o_p((n pi_n)^{-1/4}) to preserve the expansion and double robustness. We will revise the abstract to briefly state the rate form, the deterministic monotonicity assumption on pi_n, and the adjusted nuisance rate requirement. This change improves transparency without affecting the manuscript's theoretical results. revision: yes
-
Referee: [Theoretical results] Theoretical development: The extension of double robustness requires that the overlap decay rate be known a priori or estimated without disturbing the asymptotic expansion; the manuscript must state the precise conditions under which the post-hoc adjustments to the AIPW expansion remain valid, including how estimation error in the propensity or overlap interacts with the vanishing overlap term.
Authors: The manuscript states these conditions in Assumption 3 and the proof of Theorem 2: the sequence {pi_n} may be treated as known for the rate correction, while estimation of pi_n (discussed in Section 4.2) is required to converge at o_p(pi_n) so that it does not alter the leading asymptotic term. Propensity estimation error interacts with the vanishing overlap through the product bound o_p((n pi_n)^{-1/2}), which is ensured by the high-dimensional nuisance rates already imposed. We will add an explicit remark in the revision clarifying this interaction and the precise conditions under which the post-hoc adjustments remain valid, thereby making the preservation of double robustness fully transparent. revision: yes
Circularity Check
Derivation of corrected rates under D2S3 regime is self-contained and independent of inputs
full rationale
The paper extends classical AIPW double robustness and efficiency results to the MAR + vanishing overlap + shift setting by deriving explicit corrected asymptotic expansions that depend on the overlap decay sequence. These rates follow directly from the stated modeling assumptions (MAR, known or estimable decay, high-dimensional nuisance rates) rather than from any parameter fitted to the target functional on the same data. No equation reduces to a tautology, no prediction is a renamed fit, and no load-bearing step collapses to a self-citation whose content is itself unverified. The framework therefore remains non-circular.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Labels are missing at random (MAR)
- domain assumption Overlap between labeled and unlabeled samples may vanish as sample size grows
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We extend the theoretical guarantees of augmented inverse probability weighting estimators to preserve double robustness, asymptotic normality, and semiparametric efficiency under this challenging D2S3 regime... corrected asymptotic rates that capture the impact of the decay in overlap.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the observed-data influence function... ϕ(D;θ⋆)=μ⋆(X)+R/π⋆(X)(ϕF−μ⋆(X))−θ⋆
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Transporting treatment effects by calibrating large-scale observational outcomes
Proposes a calibration-based estimator for transported average treatment effects that is consistent under correct specification and achieves semiparametric efficiency with large observational data.
-
Transporting treatment effects by calibrating large-scale observational outcomes
A calibration procedure yields a weighted transported average treatment effect with asymptotically valid and efficient inference when experimental data grows slower than observational data, even without positivity or ...
Reference graph
Works this paper leans on
-
[1]
PPI++: Efficient Prediction-Powered Inference
Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zrnic. Prediction-powered inference.Science, 382(6671):669–674, 2023a. Anastasios N Angelopoulos, John C Duchi, and Tijana Zrnic. Ppi++: Efficient prediction-powered inference.arXiv preprint arXiv:2311.01453, 2023b. David Azriel, Lawrence D Brown, Michael Sklar, Richar...
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Jacob Carlson and Melissa Dell. A unifying framework for robust and efficient inference with unstructured data.arXiv preprint arXiv:2505.00282,
-
[3]
Abhishek Chakrabortty, Guorong Dai, and Raymond J Carroll. Semi-supervised quantile estimation: Robust and efficient inference in high dimensional settings.arXiv preprint arXiv:2201.10208,
-
[4]
Semi-supervised learning (chapelle, o
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews].IEEE Transactions on Neural Networks, 20(3):542–542,
work page 2006
-
[5]
Wenlong Ji, Lihua Lei, and Tijana Zrnic. Predictions as surrogates: Revisiting surrogate outcomes in the age of ai.arXiv preprint arXiv:2501.09731,
-
[6]
Irregular identification, support conditions, and inverse weight estimation
Shakeeb Khan and Elie Tamer. Irregular identification, support conditions, and inverse weight estimation. Econometrica, 78(6):2021–2042,
work page 2021
-
[7]
Dan M Kluger, Kerri Lu, Tijana Zrnic, Sherrie Wang, and Stephen Bates. Prediction-powered inference with imputed covariates and nonuniform sampling.arXiv preprint arXiv:2501.18577,
-
[8]
Assumption-lean and data- adaptive post-prediction inference.arXiv preprint arXiv:2311.14220,
Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, and Qiongshi Lu. Assumption-lean and data- adaptive post-prediction inference.arXiv preprint arXiv:2311.14220,
-
[9]
Augmented doubly robust post-imputation inference for proteomic data.bioRxiv, pages 2024–03,
18 Haeun Moon, Jin-Hong Du, Jing Lei, and Kathryn Roeder. Augmented doubly robust post-imputation inference for proteomic data.bioRxiv, pages 2024–03,
work page 2024
-
[10]
Jiawei Shan, Yiming Dong, and Jiwei Zhao. Sada: Safe and adaptive inference with multiple black-box predictions.arXiv preprint arXiv:2509.21707,
-
[11]
Zichun Xu, Daniela Witten, and Ali Shojaie. A unified framework for semiparametrically efficient semi-supervised learning.arXiv preprint arXiv:2502.17741,
-
[12]
Yuqian Zhang, Abhishek Chakrabortty , and Jelena Bradic. The decaying missing-at-random framework: Doubly robust causal inference with partially labeled data.arXiv preprint arXiv:2305.12789, 2023a. Yuqian Zhang, Abhishek Chakrabortty , and Jelena Bradic. Double robust semi-supervised inference for the mean: selection bias under mar labeling with decaying ...
-
[13]
20 Supplementary Material of “Semiparametric semi-supervised learn- ing for general targets under distribution shift and decaying over- lap” A Proofs of main statements Wherever convenient, integrals and expectations are written in linear functional notation. Thus, in- stead of E[f(X)] = R f (x) P (d x), we sometimes write P[f] . In particular, we denote ...
work page 2006
-
[14]
Therefore the following Lindeberg condition also holds (n+N) −1 n+NX i=1 E ¯V −1/2 θ ⋆ ϕ Di;θ ⋆; ¯µ;π ⋆ n,N 2 2 1 n ¯V −1/2 θ ⋆ ϕ Di;θ ⋆; ¯µ;π ⋆ n,N 2 > ϵ p n+N o →0 asn,N→ ∞, (39) which implies, by Lindeberg-Feller Central Limit Theorem (Van der Vaart, 2000), that (n+N) −1/2 ¯V −1/2 θ ⋆ n+NX i=1 ϕ Di;θ ⋆; ¯µ;π ⋆ n,N ⇝N 0,I q . (40) In summ...
work page 2000
-
[15]
Therefore the following Lindeberg condition also holds (n+N) −1 n+NX i=1 E ¯V −1/2 θ ⋆ ϕ Di;θ ⋆; ¯µ; ¯πn,N 2 2 1 n ¯V −1/2 θ ⋆ ϕ(D i;θ ⋆; ¯µ; ¯π) 2 > ϵ p n+N o →0 asn,N→ ∞, (49) which implies, by Lindeberg-Feller Central Limit Theorem (Van der Vaart, 2000), that (n+N) −1/2 ¯V −1/2 θ ⋆ n+NX i=1 ϕ(D i;θ ⋆; ¯µ; ¯π)⇝N 0,I q . (50) In summary , we have (n+N...
work page 2000
-
[16]
For the latter, the outcome regression model ˆµ is estimated using random forests (Breiman, 2001), with both biomarker and clinical covariates included as predictors. The propensity score ˆπ is estimated via logistic regression, again leveraging both sets of covariates. All nuisance functions are estimated using cross-fitting with J =5 folds to ensure val...
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.