Weight Clipping for Robust Conformal Inference under Unbounded Covariate Shifts

James Wang; Surbhi Goel

arxiv: 2605.02072 · v1 · submitted 2026-05-03 · 💻 cs.LG

Weight Clipping for Robust Conformal Inference under Unbounded Covariate Shifts

James Wang , Surbhi Goel This is my paper

Pith reviewed 2026-05-08 19:14 UTC · model grok-4.3

classification 💻 cs.LG

keywords conformal predictioncovariate shiftdensity ratio estimationweight clippingimportance fittingprediction setsrobust inference

0 comments

The pith

Clipped density ratio estimates restore reliable coverage in weighted conformal prediction even when the true ratio is unbounded.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard weighted conformal prediction loses its coverage guarantees when the density ratio between training and test distributions is unbounded or must be learned from finite samples. The paper introduces clipped least-squares importance fitting to estimate these ratios with controlled variance. Plugging the resulting clipped weights into weighted conformal prediction produces a method whose expected undercoverage stays bounded. The remaining gap is closed by running the procedure at a modestly higher target coverage level whose exact inflation amount is estimated directly from the observed data. The resulting guarantees are the first for any clipped-weight approach and hold with sample sizes that do not grow with higher moments of the true density ratio.

Core claim

Density ratios learned by clipped least-squares importance fitting, when used inside weighted conformal prediction, yield bounded expected undercoverage under covariate shifts; this undercoverage is corrected by inflating the nominal coverage target by an amount that can be estimated from the data, delivering dataset-conditional coverage whose sample complexity remains independent of higher moments of the true density ratio.

What carries the argument

Clipped least-squares importance fitting (CLISF) for density-ratio estimation inside weighted conformal prediction (WCP), paired with a data-estimated inflation of the coverage target.

If this is right

Dataset-conditional coverage is achieved with sample complexity independent of higher moments of the density ratio.
Expected undercoverage remains bounded when CLISF weights are inserted into WCP.
The inflation needed to restore coverage can be estimated from the same training data used for the density ratios.
The guarantees hold for both synthetic and real-world benchmark distributions with covariate shift.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same clipping-plus-inflation pattern may stabilize other importance-weighted procedures that currently require bounded density ratios.
Practitioners facing high-dimensional or heavy-tailed shift problems can apply the method without first verifying moment conditions on the ratio.
The data-driven inflation step suggests a general template for turning approximate coverage statements into exact ones in other conformal settings.

Load-bearing premise

The particular clipping rule inside least-squares importance fitting keeps expected undercoverage bounded and allows the needed inflation factor to be estimated from data without introducing bias that invalidates the coverage guarantee.

What would settle it

A dataset with known unbounded density ratio where the estimated inflation fails to bring empirical coverage up to the nominal level on fresh test points drawn from the shifted distribution.

Figures

Figures reproduced from arXiv: 2605.02072 by James Wang, Surbhi Goel.

**Figure 1.** Figure 1: Experimental results on iWildCam. The solid colored lines show the distribution of coverage levels over 30 trials. The colored dotted lines represent average coverage levels. Qualitatively, better performance is given by a CDF which looks like a step function about 0.8. method on the kept data and then find its coverage on the held out test set. We used a coverage level of 1 − α = 0.8. This was repeated fo… view at source ↗

**Figure 2.** Figure 2: Coverage results for CWCP (B ∈ {2.5, 5, 10, 20}), split conformal, and WCP on synthetic shifted Gaussians data. The x-axis represents β. Qualitatively, good performance corresponds to a red line which is close to y = 0.8 (good expected coverage) and a small blue region (low variance) view at source ↗

**Figure 3.** Figure 3: Coverage results for CWCP (B ∈ {5, 10, 20, 40, 80}), split conformal, WCP, and LR-QR on Communities and Crime data. The colored bars represent average coverage and prediction set size for each algorithm and the black bars represent ±1 standard deviation. split conformal displayed increasing levels of undercoverage with increasing β, where as this was less of an issue for CWCP and WCP (which account for the… view at source ↗

**Figure 4.** Figure 4: Results for the structural risk-regularized CLISF objective. Qualitatively, the best choice of regularizer λ will correspond to a plot which most closely matches the bottommost plot: this is clearly attained when λ = 0.5. 25 view at source ↗

read the original abstract

Conformal prediction (CP) provides powerful, distribution-free prediction sets, but its guarantees rely on the exchangeability of training and test data, which is often violated in practice due to covariate shifts. While weighted conformal prediction (WCP) is designed to handle such shifts, it can suffer from significant undercoverage when the density ratio between the distributions is unbounded and/or must be learned. This is because of both overfitting in learning the density ratio, and high variance in estimating the nonconformity score threshold. To address this, we introduce clipped least-squares importance fitting (CLISF) as a reduced-variance method for density ratio estimation. Specifically, we show that density ratios learned using CLISF, when plugged into WCP, have bounded expected undercoverage. Furthermore, we show that the undercoverage can be corrected by running WCP with a slightly inflated coverage target; crucially, we are able to estimate the required level of inflation from the data. We provide the first theoretical guarantees for weight clipping in conformal inference, achieving dataset-conditional coverage with a sample complexity that does not blow up with the higher moments of the true density ratio -- a key limitation of prior work. We verify our results on real-world benchmarks and synthetic data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first guarantees for clipped least-squares importance fitting in weighted conformal prediction under unbounded shifts, with a data-driven inflation fix, but the conditional coverage after inflation needs a close look.

read the letter

This paper introduces clipped least-squares importance fitting to estimate density ratios for weighted conformal prediction. They show that the clipping produces bounded expected undercoverage even when the true ratio is unbounded, and that you can recover the target coverage by inflating the level slightly, with the inflation amount estimated from the data. They also claim a sample-complexity bound that avoids blowing up with higher moments of the ratio, which prior work struggled with.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces clipped least-squares importance fitting (CLISF) for density ratio estimation to improve weighted conformal prediction (WCP) under covariate shifts with potentially unbounded ratios. It claims that CLISF yields bounded expected undercoverage when used in WCP, that this undercoverage can be corrected via a data-driven inflation of the target coverage level, and that the resulting procedure achieves dataset-conditional coverage with sample complexity independent of higher moments of the true density ratio. The paper positions these as the first theoretical guarantees for weight clipping in conformal inference and supports them with experiments on synthetic and real-world data.

Significance. If the central claims hold, the work addresses a practical limitation of existing WCP methods by providing robustness to unbounded shifts without sample complexity blow-up. The data-driven inflation correction, if valid, would be a useful practical tool. The emphasis on dataset-conditional coverage and explicit handling of learned weights distinguishes it from prior analyses that often rely on stronger assumptions or suffer from variance issues.

major comments (1)

[Theoretical results on inflation correction] The central guarantee of dataset-conditional coverage after data-driven inflation of the coverage target (abstract and theoretical results) is load-bearing. The inflation factor is estimated from the same data used to compute calibration scores and learned weights; without an explicit argument (e.g., sample splitting, martingale construction, or independence lemma) showing that this estimation does not introduce dependence that voids the conditional coverage, the claim does not follow from the bounded-expected-undercoverage result alone.

minor comments (2)

[Abstract] The abstract is dense; separating the statement of the bounded-undercoverage result, the inflation procedure, and the sample-complexity claim would improve readability.
[Method description] The precise form of the clipping threshold and the loss function inside CLISF should be stated explicitly (including any data-dependent choices) to allow verification that the boundedness assumptions hold.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful review and for identifying a key point that requires clarification in our theoretical development. We address the major comment below and will revise the manuscript to strengthen the argument for dataset-conditional coverage under data-dependent inflation.

read point-by-point responses

Referee: [Theoretical results on inflation correction] The central guarantee of dataset-conditional coverage after data-driven inflation of the coverage target (abstract and theoretical results) is load-bearing. The inflation factor is estimated from the same data used to compute calibration scores and learned weights; without an explicit argument (e.g., sample splitting, martingale construction, or independence lemma) showing that this estimation does not introduce dependence that voids the conditional coverage, the claim does not follow from the bounded-expected-undercoverage result alone.

Authors: We agree that an explicit argument is needed to rigorously establish that estimating the inflation factor from the same data does not invalidate the dataset-conditional coverage guarantee. Our current analysis shows that CLISF yields bounded expected undercoverage and that a suitable inflation can be estimated from data to correct it in expectation, but the manuscript does not provide a self-contained independence or martingale argument separating the inflation estimation from the calibration scores. In the revision we will add such an argument, for example by (i) introducing an optional sample-splitting step that reserves a small fraction of calibration points exclusively for inflation estimation while preserving the conditional coverage on the remaining points, or (ii) constructing a suitable martingale that accounts for the dependence and shows the inflated threshold still yields the desired conditional coverage. We will also update the abstract and theorem statements to reflect the clarified conditions under which the guarantee holds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent theoretical bounds and data-driven correction.

full rationale

The abstract and description present CLISF as a new reduced-variance estimator, followed by separate proofs of bounded expected undercoverage when plugged into WCP and a distinct data-driven inflation procedure to restore dataset-conditional coverage. No quoted equation or step reduces the coverage guarantee by construction to the fitted weights, the inflation estimator, or a self-citation chain. The sample-complexity claim directly contrasts with prior limitations rather than re-expressing the target in terms of the paper's own fitted quantities. This satisfies the default expectation of a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated beyond the standard conformal-prediction setup. The method CLISF is introduced as a new estimation procedure rather than a new postulated entity.

pith-pipeline@v0.9.0 · 5517 in / 1277 out tokens · 79581 ms · 2026-05-08T19:14:51.879354+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

[1]

Advances in neural information processing systems , volume=

Conformal prediction under covariate shift , author=. Advances in neural information processing systems , volume=

work page
[2]

The Annals of Statistics , volume=

Conformal prediction beyond exchangeability , author=. The Annals of Statistics , volume=. 2023 , publisher=

work page 2023
[3]

arXiv preprint arXiv:2401.17452 , year=

Group-weighted conformal prediction , author=. arXiv preprint arXiv:2401.17452 , year=

work page arXiv
[4]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Conformal inference of counterfactuals and individual treatment effects , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2021 , publisher=

work page 2021
[5]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Doubly robust calibration of prediction sets under covariate shift , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2024 , publisher=

work page 2024
[6]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Prediction sets adaptive to unknown covariate shift , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2023 , publisher=

work page 2023
[7]

arXiv preprint arXiv:2501.13430 , year=

Wasserstein-regularized conformal prediction under general distribution shift , author=. arXiv preprint arXiv:2501.13430 , year=

work page arXiv
[8]

Conformal prediction under levy-prokhorov distribution shifts: Robustness to local and global perturbations,

Conformal Prediction under Levy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations , author=. arXiv preprint arXiv:2502.14105 , year=

work page arXiv
[9]

Advances in Neural Information Processing Systems , volume=

Beyond perturbations: Learning guarantees with arbitrary adversarial test examples , author=. Advances in Neural Information Processing Systems , volume=

work page
[10]

Algorithmic Learning Theory , pages=

Efficient learning with arbitrary covariate shift , author=. Algorithmic Learning Theory , pages=. 2021 , organization=

work page 2021
[11]

Advances in Neural Information Processing Systems , volume=

Tolerant algorithms for learning with arbitrary covariate shift , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

arXiv e-prints , pages=

Likelihood-ratio regularized quantile regression: Adapting conformal prediction to high-dimensional covariate shifts , author=. arXiv e-prints , pages=

work page
[13]

The Journal of Machine Learning Research , volume=

A least-squares approach to direct importance estimation , author=. The Journal of Machine Learning Research , volume=. 2009 , publisher=

work page 2009
[14]

Journal of the American Statistical Association , volume=

Robust validation: Confident predictions even when distributions shift , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024
[15]

arXiv preprint arXiv:2402.13042 , year=

Not all distributional shifts are equal: Fine-grained robust conformal inference , author=. arXiv preprint arXiv:2402.13042 , year=

work page arXiv
[16]

Rupam Mahmood

Weight clipping for deep continual and reinforcement learning , author=. arXiv preprint arXiv:2407.01704 , year=

work page arXiv
[17]

arXiv preprint arXiv:2405.16594 , year=

Training-conditional coverage bounds under covariate shift , author=. arXiv preprint arXiv:2405.16594 , year=

work page arXiv
[18]

Annals of the Institute of Statistical Mathematics , volume=

Direct importance estimation for covariate shift adaptation , author=. Annals of the Institute of Statistical Mathematics , volume=. 2008 , publisher=

work page 2008
[19]

Dataset shift in machine learning , volume=

Covariate shift by kernel mean matching , author=. Dataset shift in machine learning , volume=

work page
[20]

International Conference on Artificial Intelligence and Statistics , pages=

Calibrated prediction with covariate shift via unsupervised domain adaptation , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

work page 2020
[21]

Advances in neural information processing systems , volume=

Learning bounds for importance weighting , author=. Advances in neural information processing systems , volume=

work page
[22]

Advances in Neural Information Processing Systems , volume=

Optimal aggregation of prediction intervals under unsupervised domain shift , author=. Advances in Neural Information Processing Systems , volume=

work page
[23]

Journal of Computational and Graphical Statistics , volume=

Truncated importance sampling , author=. Journal of Computational and Graphical Statistics , volume=. 2008 , publisher=

work page 2008
[24]

Comptes Rendus Mathematique , volume=

A Bennett concentration inequality and its application to suprema of empirical processes , author=. Comptes Rendus Mathematique , volume=. 2002 , publisher=

work page 2002
[25]

2005 , publisher=

Algorithmic learning in a random world , author=. 2005 , publisher=

work page 2005
[26]

European conference on machine learning , pages=

Inductive confidence machines for regression , author=. European conference on machine learning , pages=. 2002 , organization=

work page 2002
[27]

2009 , publisher =

Dataset Shift in Machine Learning , author =. 2009 , publisher =

work page 2009
[28]

ACM computing surveys (CSUR) , volume=

A survey on concept drift adaptation , author=. ACM computing surveys (CSUR) , volume=. 2014 , publisher=

work page 2014
[29]

Journal of statistical planning and inference , volume=

Improving predictive inference under covariate shift by weighting the log-likelihood function , author=. Journal of statistical planning and inference , volume=. 2000 , publisher=

work page 2000
[30]

Conference on learning theory , pages=

Norm-based capacity control in neural networks , author=. Conference on learning theory , pages=. 2015 , organization=

work page 2015
[31]

2014 , publisher=

Understanding machine learning: From theory to algorithms , author=. 2014 , publisher=

work page 2014
[32]

, author=

Discriminative learning under covariate shift. , author=. Journal of Machine Learning Research , volume=

work page
[33]

The iwildcam 2021 competition dataset,

The iwildcam 2021 competition dataset , author=. arXiv preprint arXiv:2105.03494 , year=

work page arXiv 2021
[34]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review arXiv
[35]

Advances in neural information processing systems , volume=

Trimmed density ratio estimation , author=. Advances in neural information processing systems , volume=

work page
[36]

Journal of machine learning research , volume=

Neural estimation of statistical divergences , author=. Journal of machine learning research , volume=

work page
[37]

arXiv preprint arXiv:2405.15337 , year=

Discriminative estimation of total variation distance: A fidelity auditor for generative data , author=. arXiv preprint arXiv:2405.15337 , year=

work page arXiv
[38]

and Zeger, K

Lugosi, G. and Zeger, K. , journal=. Concept learning using complexity regularization , year=

work page
[39]

, journal=

Koltchinskii, V. , journal=. Rademacher penalties and structural risk minimization , year=

work page
[40]

Advances in Neural Information Processing Systems , volume=

Practical and consistent estimation of f-divergences , author=. Advances in Neural Information Processing Systems , volume=

work page
[41]

Pradhan, S., Moschitti, A., Xue, N., Ng, H

PAC prediction sets under covariate shift , author=. arXiv preprint arXiv:2106.09848 , year=

work page arXiv
[42]

2002 , howpublished =

Redmond, Michael , title =. 2002 , howpublished =

work page 2002
[43]

2008 , publisher=

Dudley’s theorem, fat shattering dimension, packing numbers , author=. 2008 , publisher=

work page 2008
[44]

Fat shattering dimension and covering numbers , author=

work page
[45]

Inventiones mathematicae , volume=

Entropy and the combinatorial dimension , author=. Inventiones mathematicae , volume=. 2003 , publisher=

work page 2003
[46]

arXiv preprint arXiv:2011.13550 , year=

Tight hardness results for training depth-2 ReLU networks , author=. arXiv preprint arXiv:2011.13550 , year=

work page arXiv 2011
[47]

Unpublished Lecture Notes , volume=

Uncertain: Modern topics in uncertainty estimation , author=. Unpublished Lecture Notes , volume=

work page
[48]

Nature , volume=

A theory of power-law distributions in financial market fluctuations , author=. Nature , volume=. 2003 , publisher=

work page 2003
[49]

American journal of epidemiology , volume=

Addressing extreme propensity scores via the overlap weights , author=. American journal of epidemiology , volume=. 2019 , publisher=

work page 2019
[50]

AMIA Summits on Translational Science Proceedings , volume=

More generalizable models for sepsis detection under covariate shift , author=. AMIA Summits on Translational Science Proceedings , volume=

work page
[51]

Journal of the American Statistical Association , volume=

Robust inference using inverse probability weighting , author=. Journal of the American Statistical Association , volume=. 2020 , publisher=

work page 2020

[1] [1]

Advances in neural information processing systems , volume=

Conformal prediction under covariate shift , author=. Advances in neural information processing systems , volume=

work page

[2] [2]

The Annals of Statistics , volume=

Conformal prediction beyond exchangeability , author=. The Annals of Statistics , volume=. 2023 , publisher=

work page 2023

[3] [3]

arXiv preprint arXiv:2401.17452 , year=

Group-weighted conformal prediction , author=. arXiv preprint arXiv:2401.17452 , year=

work page arXiv

[4] [4]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Conformal inference of counterfactuals and individual treatment effects , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2021 , publisher=

work page 2021

[5] [5]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Doubly robust calibration of prediction sets under covariate shift , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2024 , publisher=

work page 2024

[6] [6]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Prediction sets adaptive to unknown covariate shift , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2023 , publisher=

work page 2023

[7] [7]

arXiv preprint arXiv:2501.13430 , year=

Wasserstein-regularized conformal prediction under general distribution shift , author=. arXiv preprint arXiv:2501.13430 , year=

work page arXiv

[8] [8]

Conformal prediction under levy-prokhorov distribution shifts: Robustness to local and global perturbations,

Conformal Prediction under Levy-Prokhorov Distribution Shifts: Robustness to Local and Global Perturbations , author=. arXiv preprint arXiv:2502.14105 , year=

work page arXiv

[9] [9]

Advances in Neural Information Processing Systems , volume=

Beyond perturbations: Learning guarantees with arbitrary adversarial test examples , author=. Advances in Neural Information Processing Systems , volume=

work page

[10] [10]

Algorithmic Learning Theory , pages=

Efficient learning with arbitrary covariate shift , author=. Algorithmic Learning Theory , pages=. 2021 , organization=

work page 2021

[11] [11]

Advances in Neural Information Processing Systems , volume=

Tolerant algorithms for learning with arbitrary covariate shift , author=. Advances in Neural Information Processing Systems , volume=

work page

[12] [12]

arXiv e-prints , pages=

Likelihood-ratio regularized quantile regression: Adapting conformal prediction to high-dimensional covariate shifts , author=. arXiv e-prints , pages=

work page

[13] [13]

The Journal of Machine Learning Research , volume=

A least-squares approach to direct importance estimation , author=. The Journal of Machine Learning Research , volume=. 2009 , publisher=

work page 2009

[14] [14]

Journal of the American Statistical Association , volume=

Robust validation: Confident predictions even when distributions shift , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024

[15] [15]

arXiv preprint arXiv:2402.13042 , year=

Not all distributional shifts are equal: Fine-grained robust conformal inference , author=. arXiv preprint arXiv:2402.13042 , year=

work page arXiv

[16] [16]

Rupam Mahmood

Weight clipping for deep continual and reinforcement learning , author=. arXiv preprint arXiv:2407.01704 , year=

work page arXiv

[17] [17]

arXiv preprint arXiv:2405.16594 , year=

Training-conditional coverage bounds under covariate shift , author=. arXiv preprint arXiv:2405.16594 , year=

work page arXiv

[18] [18]

Annals of the Institute of Statistical Mathematics , volume=

Direct importance estimation for covariate shift adaptation , author=. Annals of the Institute of Statistical Mathematics , volume=. 2008 , publisher=

work page 2008

[19] [19]

Dataset shift in machine learning , volume=

Covariate shift by kernel mean matching , author=. Dataset shift in machine learning , volume=

work page

[20] [20]

International Conference on Artificial Intelligence and Statistics , pages=

Calibrated prediction with covariate shift via unsupervised domain adaptation , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

work page 2020

[21] [21]

Advances in neural information processing systems , volume=

Learning bounds for importance weighting , author=. Advances in neural information processing systems , volume=

work page

[22] [22]

Advances in Neural Information Processing Systems , volume=

Optimal aggregation of prediction intervals under unsupervised domain shift , author=. Advances in Neural Information Processing Systems , volume=

work page

[23] [23]

Journal of Computational and Graphical Statistics , volume=

Truncated importance sampling , author=. Journal of Computational and Graphical Statistics , volume=. 2008 , publisher=

work page 2008

[24] [24]

Comptes Rendus Mathematique , volume=

A Bennett concentration inequality and its application to suprema of empirical processes , author=. Comptes Rendus Mathematique , volume=. 2002 , publisher=

work page 2002

[25] [25]

2005 , publisher=

Algorithmic learning in a random world , author=. 2005 , publisher=

work page 2005

[26] [26]

European conference on machine learning , pages=

Inductive confidence machines for regression , author=. European conference on machine learning , pages=. 2002 , organization=

work page 2002

[27] [27]

2009 , publisher =

Dataset Shift in Machine Learning , author =. 2009 , publisher =

work page 2009

[28] [28]

ACM computing surveys (CSUR) , volume=

A survey on concept drift adaptation , author=. ACM computing surveys (CSUR) , volume=. 2014 , publisher=

work page 2014

[29] [29]

Journal of statistical planning and inference , volume=

Improving predictive inference under covariate shift by weighting the log-likelihood function , author=. Journal of statistical planning and inference , volume=. 2000 , publisher=

work page 2000

[30] [30]

Conference on learning theory , pages=

Norm-based capacity control in neural networks , author=. Conference on learning theory , pages=. 2015 , organization=

work page 2015

[31] [31]

2014 , publisher=

Understanding machine learning: From theory to algorithms , author=. 2014 , publisher=

work page 2014

[32] [32]

, author=

Discriminative learning under covariate shift. , author=. Journal of Machine Learning Research , volume=

work page

[33] [33]

The iwildcam 2021 competition dataset,

The iwildcam 2021 competition dataset , author=. arXiv preprint arXiv:2105.03494 , year=

work page arXiv 2021

[34] [34]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review arXiv

[35] [35]

Advances in neural information processing systems , volume=

Trimmed density ratio estimation , author=. Advances in neural information processing systems , volume=

work page

[36] [36]

Journal of machine learning research , volume=

Neural estimation of statistical divergences , author=. Journal of machine learning research , volume=

work page

[37] [37]

arXiv preprint arXiv:2405.15337 , year=

Discriminative estimation of total variation distance: A fidelity auditor for generative data , author=. arXiv preprint arXiv:2405.15337 , year=

work page arXiv

[38] [38]

and Zeger, K

Lugosi, G. and Zeger, K. , journal=. Concept learning using complexity regularization , year=

work page

[39] [39]

, journal=

Koltchinskii, V. , journal=. Rademacher penalties and structural risk minimization , year=

work page

[40] [40]

Advances in Neural Information Processing Systems , volume=

Practical and consistent estimation of f-divergences , author=. Advances in Neural Information Processing Systems , volume=

work page

[41] [41]

Pradhan, S., Moschitti, A., Xue, N., Ng, H

PAC prediction sets under covariate shift , author=. arXiv preprint arXiv:2106.09848 , year=

work page arXiv

[42] [42]

2002 , howpublished =

Redmond, Michael , title =. 2002 , howpublished =

work page 2002

[43] [43]

2008 , publisher=

Dudley’s theorem, fat shattering dimension, packing numbers , author=. 2008 , publisher=

work page 2008

[44] [44]

Fat shattering dimension and covering numbers , author=

work page

[45] [45]

Inventiones mathematicae , volume=

Entropy and the combinatorial dimension , author=. Inventiones mathematicae , volume=. 2003 , publisher=

work page 2003

[46] [46]

arXiv preprint arXiv:2011.13550 , year=

Tight hardness results for training depth-2 ReLU networks , author=. arXiv preprint arXiv:2011.13550 , year=

work page arXiv 2011

[47] [47]

Unpublished Lecture Notes , volume=

Uncertain: Modern topics in uncertainty estimation , author=. Unpublished Lecture Notes , volume=

work page

[48] [48]

Nature , volume=

A theory of power-law distributions in financial market fluctuations , author=. Nature , volume=. 2003 , publisher=

work page 2003

[49] [49]

American journal of epidemiology , volume=

Addressing extreme propensity scores via the overlap weights , author=. American journal of epidemiology , volume=. 2019 , publisher=

work page 2019

[50] [50]

AMIA Summits on Translational Science Proceedings , volume=

More generalizable models for sepsis detection under covariate shift , author=. AMIA Summits on Translational Science Proceedings , volume=

work page

[51] [51]

Journal of the American Statistical Association , volume=

Robust inference using inverse probability weighting , author=. Journal of the American Statistical Association , volume=. 2020 , publisher=

work page 2020