pith. sign in

arxiv: 2605.02072 · v1 · submitted 2026-05-03 · 💻 cs.LG

Weight Clipping for Robust Conformal Inference under Unbounded Covariate Shifts

Pith reviewed 2026-05-08 19:14 UTC · model grok-4.3

classification 💻 cs.LG
keywords conformal predictioncovariate shiftdensity ratio estimationweight clippingimportance fittingprediction setsrobust inference
0
0 comments X

The pith

Clipped density ratio estimates restore reliable coverage in weighted conformal prediction even when the true ratio is unbounded.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard weighted conformal prediction loses its coverage guarantees when the density ratio between training and test distributions is unbounded or must be learned from finite samples. The paper introduces clipped least-squares importance fitting to estimate these ratios with controlled variance. Plugging the resulting clipped weights into weighted conformal prediction produces a method whose expected undercoverage stays bounded. The remaining gap is closed by running the procedure at a modestly higher target coverage level whose exact inflation amount is estimated directly from the observed data. The resulting guarantees are the first for any clipped-weight approach and hold with sample sizes that do not grow with higher moments of the true density ratio.

Core claim

Density ratios learned by clipped least-squares importance fitting, when used inside weighted conformal prediction, yield bounded expected undercoverage under covariate shifts; this undercoverage is corrected by inflating the nominal coverage target by an amount that can be estimated from the data, delivering dataset-conditional coverage whose sample complexity remains independent of higher moments of the true density ratio.

What carries the argument

Clipped least-squares importance fitting (CLISF) for density-ratio estimation inside weighted conformal prediction (WCP), paired with a data-estimated inflation of the coverage target.

If this is right

  • Dataset-conditional coverage is achieved with sample complexity independent of higher moments of the density ratio.
  • Expected undercoverage remains bounded when CLISF weights are inserted into WCP.
  • The inflation needed to restore coverage can be estimated from the same training data used for the density ratios.
  • The guarantees hold for both synthetic and real-world benchmark distributions with covariate shift.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same clipping-plus-inflation pattern may stabilize other importance-weighted procedures that currently require bounded density ratios.
  • Practitioners facing high-dimensional or heavy-tailed shift problems can apply the method without first verifying moment conditions on the ratio.
  • The data-driven inflation step suggests a general template for turning approximate coverage statements into exact ones in other conformal settings.

Load-bearing premise

The particular clipping rule inside least-squares importance fitting keeps expected undercoverage bounded and allows the needed inflation factor to be estimated from data without introducing bias that invalidates the coverage guarantee.

What would settle it

A dataset with known unbounded density ratio where the estimated inflation fails to bring empirical coverage up to the nominal level on fresh test points drawn from the shifted distribution.

Figures

Figures reproduced from arXiv: 2605.02072 by James Wang, Surbhi Goel.

Figure 1
Figure 1. Figure 1: Experimental results on iWildCam. The solid colored lines show the distribution of coverage levels over 30 trials. The colored dotted lines represent average coverage levels. Qualitatively, better performance is given by a CDF which looks like a step function about 0.8. method on the kept data and then find its coverage on the held out test set. We used a coverage level of 1 − α = 0.8. This was repeated fo… view at source ↗
Figure 2
Figure 2. Figure 2: Coverage results for CWCP (B ∈ {2.5, 5, 10, 20}), split conformal, and WCP on synthetic shifted Gaussians data. The x-axis represents β. Qualitatively, good performance corresponds to a red line which is close to y = 0.8 (good expected coverage) and a small blue region (low variance) view at source ↗
Figure 3
Figure 3. Figure 3: Coverage results for CWCP (B ∈ {5, 10, 20, 40, 80}), split conformal, WCP, and LR-QR on Communities and Crime data. The colored bars represent average coverage and prediction set size for each algorithm and the black bars represent ±1 standard deviation. split conformal displayed increasing levels of undercoverage with increasing β, where as this was less of an issue for CWCP and WCP (which account for the… view at source ↗
Figure 4
Figure 4. Figure 4: Results for the structural risk-regularized CLISF objective. Qualitatively, the best choice of regularizer λ will correspond to a plot which most closely matches the bottommost plot: this is clearly attained when λ = 0.5. 25 view at source ↗
read the original abstract

Conformal prediction (CP) provides powerful, distribution-free prediction sets, but its guarantees rely on the exchangeability of training and test data, which is often violated in practice due to covariate shifts. While weighted conformal prediction (WCP) is designed to handle such shifts, it can suffer from significant undercoverage when the density ratio between the distributions is unbounded and/or must be learned. This is because of both overfitting in learning the density ratio, and high variance in estimating the nonconformity score threshold. To address this, we introduce clipped least-squares importance fitting (CLISF) as a reduced-variance method for density ratio estimation. Specifically, we show that density ratios learned using CLISF, when plugged into WCP, have bounded expected undercoverage. Furthermore, we show that the undercoverage can be corrected by running WCP with a slightly inflated coverage target; crucially, we are able to estimate the required level of inflation from the data. We provide the first theoretical guarantees for weight clipping in conformal inference, achieving dataset-conditional coverage with a sample complexity that does not blow up with the higher moments of the true density ratio -- a key limitation of prior work. We verify our results on real-world benchmarks and synthetic data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces clipped least-squares importance fitting (CLISF) for density ratio estimation to improve weighted conformal prediction (WCP) under covariate shifts with potentially unbounded ratios. It claims that CLISF yields bounded expected undercoverage when used in WCP, that this undercoverage can be corrected via a data-driven inflation of the target coverage level, and that the resulting procedure achieves dataset-conditional coverage with sample complexity independent of higher moments of the true density ratio. The paper positions these as the first theoretical guarantees for weight clipping in conformal inference and supports them with experiments on synthetic and real-world data.

Significance. If the central claims hold, the work addresses a practical limitation of existing WCP methods by providing robustness to unbounded shifts without sample complexity blow-up. The data-driven inflation correction, if valid, would be a useful practical tool. The emphasis on dataset-conditional coverage and explicit handling of learned weights distinguishes it from prior analyses that often rely on stronger assumptions or suffer from variance issues.

major comments (1)
  1. [Theoretical results on inflation correction] The central guarantee of dataset-conditional coverage after data-driven inflation of the coverage target (abstract and theoretical results) is load-bearing. The inflation factor is estimated from the same data used to compute calibration scores and learned weights; without an explicit argument (e.g., sample splitting, martingale construction, or independence lemma) showing that this estimation does not introduce dependence that voids the conditional coverage, the claim does not follow from the bounded-expected-undercoverage result alone.
minor comments (2)
  1. [Abstract] The abstract is dense; separating the statement of the bounded-undercoverage result, the inflation procedure, and the sample-complexity claim would improve readability.
  2. [Method description] The precise form of the clipping threshold and the loss function inside CLISF should be stated explicitly (including any data-dependent choices) to allow verification that the boundedness assumptions hold.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful review and for identifying a key point that requires clarification in our theoretical development. We address the major comment below and will revise the manuscript to strengthen the argument for dataset-conditional coverage under data-dependent inflation.

read point-by-point responses
  1. Referee: [Theoretical results on inflation correction] The central guarantee of dataset-conditional coverage after data-driven inflation of the coverage target (abstract and theoretical results) is load-bearing. The inflation factor is estimated from the same data used to compute calibration scores and learned weights; without an explicit argument (e.g., sample splitting, martingale construction, or independence lemma) showing that this estimation does not introduce dependence that voids the conditional coverage, the claim does not follow from the bounded-expected-undercoverage result alone.

    Authors: We agree that an explicit argument is needed to rigorously establish that estimating the inflation factor from the same data does not invalidate the dataset-conditional coverage guarantee. Our current analysis shows that CLISF yields bounded expected undercoverage and that a suitable inflation can be estimated from data to correct it in expectation, but the manuscript does not provide a self-contained independence or martingale argument separating the inflation estimation from the calibration scores. In the revision we will add such an argument, for example by (i) introducing an optional sample-splitting step that reserves a small fraction of calibration points exclusively for inflation estimation while preserving the conditional coverage on the remaining points, or (ii) constructing a suitable martingale that accounts for the dependence and shows the inflated threshold still yields the desired conditional coverage. We will also update the abstract and theorem statements to reflect the clarified conditions under which the guarantee holds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent theoretical bounds and data-driven correction.

full rationale

The abstract and description present CLISF as a new reduced-variance estimator, followed by separate proofs of bounded expected undercoverage when plugged into WCP and a distinct data-driven inflation procedure to restore dataset-conditional coverage. No quoted equation or step reduces the coverage guarantee by construction to the fitted weights, the inflation estimator, or a self-citation chain. The sample-complexity claim directly contrasts with prior limitations rather than re-expressing the target in terms of the paper's own fitted quantities. This satisfies the default expectation of a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated beyond the standard conformal-prediction setup. The method CLISF is introduced as a new estimation procedure rather than a new postulated entity.

pith-pipeline@v0.9.0 · 5517 in / 1277 out tokens · 79581 ms · 2026-05-08T19:14:51.879354+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

  1. [1]

    Advances in neural information processing systems , volume=

    Conformal prediction under covariate shift , author=. Advances in neural information processing systems , volume=

  2. [2]

    The Annals of Statistics , volume=

    Conformal prediction beyond exchangeability , author=. The Annals of Statistics , volume=. 2023 , publisher=

  3. [3]

    arXiv preprint arXiv:2401.17452 , year=

    Group-weighted conformal prediction , author=. arXiv preprint arXiv:2401.17452 , year=

  4. [4]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Conformal inference of counterfactuals and individual treatment effects , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2021 , publisher=

  5. [5]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Doubly robust calibration of prediction sets under covariate shift , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2024 , publisher=

  6. [6]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Prediction sets adaptive to unknown covariate shift , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2023 , publisher=

  7. [7]

    arXiv preprint arXiv:2501.13430 , year=

    Wasserstein-regularized conformal prediction under general distribution shift , author=. arXiv preprint arXiv:2501.13430 , year=

  8. [8]

    Conformal prediction under levy-prokhorov distribution shifts: Robustness to local and global perturbations,

    Conformal Prediction under Levy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations , author=. arXiv preprint arXiv:2502.14105 , year=

  9. [9]

    Advances in Neural Information Processing Systems , volume=

    Beyond perturbations: Learning guarantees with arbitrary adversarial test examples , author=. Advances in Neural Information Processing Systems , volume=

  10. [10]

    Algorithmic Learning Theory , pages=

    Efficient learning with arbitrary covariate shift , author=. Algorithmic Learning Theory , pages=. 2021 , organization=

  11. [11]

    Advances in Neural Information Processing Systems , volume=

    Tolerant algorithms for learning with arbitrary covariate shift , author=. Advances in Neural Information Processing Systems , volume=

  12. [12]

    arXiv e-prints , pages=

    Likelihood-ratio regularized quantile regression: Adapting conformal prediction to high-dimensional covariate shifts , author=. arXiv e-prints , pages=

  13. [13]

    The Journal of Machine Learning Research , volume=

    A least-squares approach to direct importance estimation , author=. The Journal of Machine Learning Research , volume=. 2009 , publisher=

  14. [14]

    Journal of the American Statistical Association , volume=

    Robust validation: Confident predictions even when distributions shift , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

  15. [15]

    arXiv preprint arXiv:2402.13042 , year=

    Not all distributional shifts are equal: Fine-grained robust conformal inference , author=. arXiv preprint arXiv:2402.13042 , year=

  16. [16]

    Rupam Mahmood

    Weight clipping for deep continual and reinforcement learning , author=. arXiv preprint arXiv:2407.01704 , year=

  17. [17]

    arXiv preprint arXiv:2405.16594 , year=

    Training-conditional coverage bounds under covariate shift , author=. arXiv preprint arXiv:2405.16594 , year=

  18. [18]

    Annals of the Institute of Statistical Mathematics , volume=

    Direct importance estimation for covariate shift adaptation , author=. Annals of the Institute of Statistical Mathematics , volume=. 2008 , publisher=

  19. [19]

    Dataset shift in machine learning , volume=

    Covariate shift by kernel mean matching , author=. Dataset shift in machine learning , volume=

  20. [20]

    International Conference on Artificial Intelligence and Statistics , pages=

    Calibrated prediction with covariate shift via unsupervised domain adaptation , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

  21. [21]

    Advances in neural information processing systems , volume=

    Learning bounds for importance weighting , author=. Advances in neural information processing systems , volume=

  22. [22]

    Advances in Neural Information Processing Systems , volume=

    Optimal aggregation of prediction intervals under unsupervised domain shift , author=. Advances in Neural Information Processing Systems , volume=

  23. [23]

    Journal of Computational and Graphical Statistics , volume=

    Truncated importance sampling , author=. Journal of Computational and Graphical Statistics , volume=. 2008 , publisher=

  24. [24]

    Comptes Rendus Mathematique , volume=

    A Bennett concentration inequality and its application to suprema of empirical processes , author=. Comptes Rendus Mathematique , volume=. 2002 , publisher=

  25. [25]

    2005 , publisher=

    Algorithmic learning in a random world , author=. 2005 , publisher=

  26. [26]

    European conference on machine learning , pages=

    Inductive confidence machines for regression , author=. European conference on machine learning , pages=. 2002 , organization=

  27. [27]

    2009 , publisher =

    Dataset Shift in Machine Learning , author =. 2009 , publisher =

  28. [28]

    ACM computing surveys (CSUR) , volume=

    A survey on concept drift adaptation , author=. ACM computing surveys (CSUR) , volume=. 2014 , publisher=

  29. [29]

    Journal of statistical planning and inference , volume=

    Improving predictive inference under covariate shift by weighting the log-likelihood function , author=. Journal of statistical planning and inference , volume=. 2000 , publisher=

  30. [30]

    Conference on learning theory , pages=

    Norm-based capacity control in neural networks , author=. Conference on learning theory , pages=. 2015 , organization=

  31. [31]

    2014 , publisher=

    Understanding machine learning: From theory to algorithms , author=. 2014 , publisher=

  32. [32]

    , author=

    Discriminative learning under covariate shift. , author=. Journal of Machine Learning Research , volume=

  33. [33]

    The iwildcam 2021 competition dataset,

    The iwildcam 2021 competition dataset , author=. arXiv preprint arXiv:2105.03494 , year=

  34. [34]

    Proximal Policy Optimization Algorithms

    Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

  35. [35]

    Advances in neural information processing systems , volume=

    Trimmed density ratio estimation , author=. Advances in neural information processing systems , volume=

  36. [36]

    Journal of machine learning research , volume=

    Neural estimation of statistical divergences , author=. Journal of machine learning research , volume=

  37. [37]

    arXiv preprint arXiv:2405.15337 , year=

    Discriminative estimation of total variation distance: A fidelity auditor for generative data , author=. arXiv preprint arXiv:2405.15337 , year=

  38. [38]

    and Zeger, K

    Lugosi, G. and Zeger, K. , journal=. Concept learning using complexity regularization , year=

  39. [39]

    , journal=

    Koltchinskii, V. , journal=. Rademacher penalties and structural risk minimization , year=

  40. [40]

    Advances in Neural Information Processing Systems , volume=

    Practical and consistent estimation of f-divergences , author=. Advances in Neural Information Processing Systems , volume=

  41. [41]

    Pradhan, S., Moschitti, A., Xue, N., Ng, H

    PAC prediction sets under covariate shift , author=. arXiv preprint arXiv:2106.09848 , year=

  42. [42]

    2002 , howpublished =

    Redmond, Michael , title =. 2002 , howpublished =

  43. [43]

    2008 , publisher=

    Dudley’s theorem, fat shattering dimension, packing numbers , author=. 2008 , publisher=

  44. [44]

    Fat shattering dimension and covering numbers , author=

  45. [45]

    Inventiones mathematicae , volume=

    Entropy and the combinatorial dimension , author=. Inventiones mathematicae , volume=. 2003 , publisher=

  46. [46]

    arXiv preprint arXiv:2011.13550 , year=

    Tight hardness results for training depth-2 ReLU networks , author=. arXiv preprint arXiv:2011.13550 , year=

  47. [47]

    Unpublished Lecture Notes , volume=

    Uncertain: Modern topics in uncertainty estimation , author=. Unpublished Lecture Notes , volume=

  48. [48]

    Nature , volume=

    A theory of power-law distributions in financial market fluctuations , author=. Nature , volume=. 2003 , publisher=

  49. [49]

    American journal of epidemiology , volume=

    Addressing extreme propensity scores via the overlap weights , author=. American journal of epidemiology , volume=. 2019 , publisher=

  50. [50]

    AMIA Summits on Translational Science Proceedings , volume=

    More generalizable models for sepsis detection under covariate shift , author=. AMIA Summits on Translational Science Proceedings , volume=

  51. [51]

    Journal of the American Statistical Association , volume=

    Robust inference using inverse probability weighting , author=. Journal of the American Statistical Association , volume=. 2020 , publisher=