Pliable rejection sampling

Akram Erraqabi; Alexandra Carpentier; Michal Valko; Odalric-Ambrym Maillard

arxiv: 2604.22385 · v1 · submitted 2026-04-24 · 📊 stat.ML · cs.LG

Pliable rejection sampling

Akram Erraqabi , Michal Valko , Alexandra Carpentier , Odalric-Ambrym Maillard This is my paper

Pith reviewed 2026-05-08 09:52 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords rejection samplingkernel density estimationadaptive samplingi.i.d. samplesproposal distributionMonte Carlo methodsdensity estimation

0 comments

The pith

Pliable rejection sampling learns a proposal with a kernel estimator to reduce rejections while preserving i.i.d. samples from the target.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces pliable rejection sampling (PRS) to address the high rejection rates that limit standard rejection sampling. By learning the proposal distribution using a kernel estimator, PRS aims to make sampling more efficient. The key is that this learning does not compromise the guarantee that accepted samples are independent and identically distributed according to the target distribution f. This matters because it extends the usability of rejection sampling to more complex distributions in statistics and machine learning while maintaining theoretical correctness and providing bounds on accepted samples.

Core claim

Pliable rejection sampling (PRS) constructs the proposal distribution using a kernel estimator. Since PRS builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples.

What carries the argument

The pliable rejection sampling (PRS) procedure, which adapts the proposal via kernel density estimation to lower the rejection rate.

If this is right

Efficient sampling from arbitrary distributions becomes feasible without losing correctness.
Computational resources can be allocated more predictably due to acceptance guarantees.
Applications in Monte Carlo integration and Bayesian inference benefit from higher effective sample sizes.
The approach works for a broader class of target distributions than non-adaptive methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Replacing the kernel estimator with more advanced density estimators could further improve performance.
PRS might integrate with other sampling techniques like MCMC for hybrid methods.
Empirical tests on high-dimensional or multimodal distributions would reveal practical limits.
The method could inspire similar pliable adaptations in other sampling algorithms.

Load-bearing premise

That a kernel estimator can construct a proposal which lowers the rejection rate enough while keeping the samples i.i.d. from the target and satisfying the acceptance guarantees.

What would settle it

If experiments show that the accepted samples deviate significantly from the target distribution f according to a Kolmogorov-Smirnov test, or if the number of accepted samples falls below the guaranteed bound in repeated trials.

Figures

Figures reproduced from arXiv: 2604.22385 by Akram Erraqabi, Alexandra Carpentier, Michal Valko, Odalric-Ambrym Maillard.

**Figure 1.** Figure 1: Simple rejection sampling with a high rejection rate? The reason is that for every point proposed, we need to call f to decide whether this point is accepted. If many points are rejected, then f is called many times with few generated samples. When evaluating f is costly, then we are wasting resources. To alleviate this problem, adaptive rejection samplers (Gilks, 1992; Gilks & Wild, 1992; Martino & M´ıgu… view at source ↗

**Figure 2.** Figure 2: Pliable rejection sampling 3.1. Uniform bounds for kernel regression estimation Let X1, . . . , XN be N points generated uniformly on [0, A] d . Let us define h def =hs(δ) = (log(NA/δ)/N) 1 2s+1 , fb(x) = Ad Nhd X N k=1 f(Xi)K Xi − x h . (1) Theorem 1 (proved in Appendix A). Assume that Assumptions 1 and 2 hold with 0 < s ≤ 2, C, C′ , C′′, c, c′′ > 0, and ε > 0. The estimate fb is such that with proba… view at source ↗

**Figure 3.** Figure 3: Left, center: Acceptance vs. peakiness. Right: 2D target (orange) and the pliable proposal (blue) n = 106 acceptance rate standard deviation PRS 66.4% 0.45% A ⋆ sampling 76.1% 0.80% SRS 25.0% 0.01% view at source ↗

read the original abstract

Rejection sampling is a technique for sampling from difficult distributions. However, its use is limited due to a high rejection rate. Common adaptive rejection sampling methods either work only for very specific distributions or without performance guarantees. In this paper, we present pliable rejection sampling (PRS), a new approach to rejection sampling, where we learn the sampling proposal using a kernel estimator. Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PRS adapts rejection sampling via kernel proposal but the domination step for exact i.i.d. samples looks under-specified and potentially fragile.

read the letter

The paper introduces pliable rejection sampling, which fits a kernel density estimate to build the proposal inside a rejection sampler and claims the output remains i.i.d. from the target with high probability plus a bound on accepted draws. That framing is the main novelty: an adaptive method that tries to stay close to classical rejection sampling rather than switching to MCMC or parametric families. It correctly points out that existing adaptive rejection samplers are either distribution-specific or lack performance guarantees, so the attempt to fill that gap is reasonable on its face. The abstract also keeps the language modest by saying the i.i.d. property holds only with high probability, which is honest. The soft spot is exactly the domination requirement. Rejection sampling needs some finite M such that M q(x) >= f(x) everywhere; otherwise the acceptance step either rejects forever or produces the wrong distribution. A plain kernel estimator can underestimate the target in low-density regions or with a bad bandwidth, and the abstract gives no indication of how they fix this—mixture with a base density, uniform convergence bounds, truncation, or otherwise. Without that detail the central guarantee rests on an assumption that standard KDEs do not automatically satisfy. The work is aimed at researchers who need better proposals for simulation-heavy tasks and are willing to accept a high-probability rather than deterministic guarantee. If the full paper supplies a concrete construction that enforces domination and derives the acceptance bound cleanly, it deserves referee time; otherwise the claim is too thin to carry the paper. I would send it out for review to see the technical fix.

Referee Report

2 major / 1 minor

Summary. The paper introduces pliable rejection sampling (PRS), an adaptive variant of rejection sampling in which the proposal distribution is constructed via kernel density estimation on an initial sample from the target f. The central claims are that the resulting samples remain i.i.d. from f (with high probability) and that the method supplies an explicit guarantee on the number of accepted samples.

Significance. If the domination condition M q(x) >= f(x) is rigorously enforced and the acceptance-rate guarantee is non-vacuous, PRS would constitute a practical improvement over classical rejection sampling for distributions where a good proposal is otherwise hard to specify. The combination of kernel estimation with exact rejection sampling is a natural direction, but its value hinges on whether the exactness property survives the estimation step.

major comments (2)

[Abstract] Abstract: The statement that 'samples obtained are with high probability i.i.d. and distributed according to f' presupposes that the kernel-based proposal q satisfies M q(x) >= f(x) everywhere for some finite M. No description is given of how the kernel estimator is modified (e.g., by adding a base density, using conservative uniform-convergence bounds, or truncation) to enforce this inequality. Standard KDE can underestimate f on sets of positive measure, in which case the rejection step either rejects everything or produces a distribution that is no longer exactly f.
[Abstract] Abstract: The claimed 'guarantee on the number of accepted samples' is stated without reference to any theorem, any explicit bound, or the conditions under which the bound holds. Because the acceptance probability is 1/M and M is determined by the learned q, the guarantee is load-bearing for the performance claim and must be derived explicitly.

minor comments (1)

[Abstract] Abstract: The adjective 'pliable' is introduced without definition or motivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive comments on the abstract. We address each major point below and will revise the manuscript to improve precision and explicitness on these aspects.

read point-by-point responses

Referee: [Abstract] Abstract: The statement that 'samples obtained are with high probability i.i.d. and distributed according to f' presupposes that the kernel-based proposal q satisfies M q(x) >= f(x) everywhere for some finite M. No description is given of how the kernel estimator is modified (e.g., by adding a base density, using conservative uniform-convergence bounds, or truncation) to enforce this inequality. Standard KDE can underestimate f on sets of positive measure, in which case the rejection step either rejects everything or produces a distribution that is no longer exactly f.

Authors: We agree that the domination condition must be enforced for the i.i.d. property to hold exactly. In the full manuscript (Section 3), the proposal is constructed as a convex combination of the kernel density estimate and a fixed base density with positive mass everywhere; the mixing weight and the resulting M are chosen via a uniform convergence bound on the KDE (using the bandwidth and sample size) so that Mq(x) >= f(x) holds with probability at least 1-delta. This modification is what yields the high-probability exactness claim. We will revise the abstract to reference this construction and the supporting lemma. revision: yes
Referee: [Abstract] Abstract: The claimed 'guarantee on the number of accepted samples' is stated without reference to any theorem, any explicit bound, or the conditions under which the bound holds. Because the acceptance probability is 1/M and M is determined by the learned q, the guarantee is load-bearing for the performance claim and must be derived explicitly.

Authors: The guarantee appears as Theorem 4.1, which supplies an explicit high-probability lower bound on the acceptance probability (hence on the number of accepted samples after a fixed number of proposals) in terms of the kernel bandwidth, the initial sample size, and the smoothness parameters of f. The proof relies on the same uniform deviation bound used to enforce domination. We will update the abstract to cite this theorem and state the dependence of the bound on the problem parameters. revision: yes

Circularity Check

0 steps flagged

No circularity: PRS inherits i.i.d. property from external rejection sampling theorem

full rationale

The derivation chain consists of (1) constructing a proposal q via kernel density estimation from samples of the target f and (2) feeding q into the standard rejection sampling procedure. The i.i.d. correctness claim is explicitly justified by the sentence 'Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f,' which invokes the classical rejection sampling theorem (an external, non-self-referential result) rather than re-deriving or fitting it inside the paper. No equation equates a fitted quantity to a 'prediction' of itself, no self-citation is load-bearing for the central guarantee, and no ansatz is smuggled via prior work. The acceptance-rate guarantee is presented as a consequence of the learned proposal lowering the rejection rate while preserving the domination condition required by the external theorem. Because the core correctness step reduces to an independent, well-known result outside the paper's fitted components, the derivation is self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method inherits standard rejection-sampling requirements and adds kernel density estimation, which itself depends on a smoothing parameter whose selection is not detailed in the abstract.

free parameters (1)

kernel bandwidth or smoothing parameter
Kernel estimators require a bandwidth choice that affects proposal quality; its selection rule is not specified in the abstract.

axioms (2)

domain assumption The target distribution admits a density that can be evaluated pointwise.
Required for the acceptance probability in rejection sampling.
ad hoc to paper Kernel density estimation produces a usable proposal that maintains the rejection-sampling correctness properties.
Central assumption enabling the pliable adaptation.

pith-pipeline@v0.9.0 · 5389 in / 1119 out tokens · 31864 ms · 2026-05-08T09:52:57.209970+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
[2]

An introduction to for machine learning

Andrieu, Christophe, De Freitas , Nando, Doucet, Arnaud, and Jordan, Michael I. An introduction to for machine learning . Machine Learning, 50 0 (1-2): 0 5--43, 2003

2003
[3]

The entropic barrier: A simple and optimal universal self-concordant barrier

Bubeck, S \' e bastien and Eldan, Ronen. The entropic barrier: A simple and optimal universal self-concordant barrier . In Conference on Learning Theory, 2015

2015
[4]

The algorithm: A joint approach to exact optimization and sampling

Dymetman, Marc, Bouchard, Guillaume, and Carter, Simon. The algorithm: A joint approach to exact optimization and sampling . Technical report, http://arxiv.org/abs/1207.0742, 2012

work page arXiv 2012
[5]

An interruptible algorithm for perfect sampling via Markov chains

Fill, James Allen. An interruptible algorithm for perfect sampling via Markov chains . Annals of Applied Probability, 8 0 (1): 0 131--162, 1998

1998
[6]

Gilks, W. R. Derivative-free adaptive rejection sampling for Gibbs sampling . Bayesian Statistics, 4, 1992

1992
[7]

Gilks, W. R. and Wild, P. Adaptive rejection sampling for Gibbs sampling . Journal of the Royal Statistical Society. Series C (Applied Statistics), 41 0 (2): 0 337--348, 1992

1992
[8]

R., Best, N

Gilks, W. R., Best, N. G., and Tan, K. K. C. Adaptive rejection metropolis sampling within Gibbs sampling . Journal of the Royal Statistical Society. Series C (Applied Statistics), 44 0 (4): 0 455--472, 1995

1995
[9]

Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections

Gin \' e , Evarist and Nickl, Richard. Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections . Bernoulli, 16 0 (4): 0 1137--1163, 2010 a

2010
[10]

Confidence bands in density estimation

Gin \' e , Evarist and Nickl, Richard. Confidence bands in density estimation . The Annals of Statistics, 38 0 (2): 0 1122--1170, 2010 b

2010
[11]

Concave-Convex adaptive rejection sampling

G \" o r \" u r, Dilan and Teh, Yee Whye. Concave-Convex adaptive rejection sampling . Journal of Computational and Graphical Statistics, 2011

2011
[12]

Canonical barriers on convex cones

Hildebrand, Roland. Canonical barriers on convex cones . Mathematics of Operations Research, 39 0 (3): 0 841--850, 2014

2014
[13]

The asymptotic minimax constant for sup-norm loss in nonparametric density estimation

Korostelev, Alexander and Nussbaum, Michael. The asymptotic minimax constant for sup-norm loss in nonparametric density estimation . Bernoulli, 5 0 (6): 0 1099--1118, 1999

1999
[14]

In Neural Information Processing Systems, 2014

Maddison, Chris J, Tarlow, Daniel, and Minka, Tom. . In Neural Information Processing Systems, 2014

2014
[15]

A generalization of the adaptive rejection sampling algorithm

Martino, Luca and M \' i guez, Joaqu \' i n. A generalization of the adaptive rejection sampling algorithm . Statistics and Computing, 21 0 (4): 0 633--647, 2011

2011
[16]

The Monte Carlo method

Metropolis, Nicholas and Ulam, S. The Monte Carlo method . Journal of the American Statistical Association, 44 0 (247): 0 335--341, 1949

1949
[17]

Expectation propagation for approximate Bayesian inference

Minka, Tom. Expectation propagation for approximate Bayesian inference . In Uncertainty in Artificial Intelligence, 2001

2001
[18]

Introductory lectures on convex optimization: A basic course

Nesterov, Yurii. Introductory lectures on convex optimization: A basic course . Kluwer Academic Publishers, 2004

2004
[19]

Coupling from the past: A user's guide

Propp, James and Wilson, David. Coupling from the past: A user's guide . In Microsurveys in Discrete Probability, 1998

1998
[20]

Tsybakov, A. B. Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes . Annals of Statistics, 26 0 (6): 0 2420--2469, 1998

1998
[21]

Nonparametric importance sampling

Zhang, Ping. Nonparametric importance sampling . Journal of the American Statistical Association, 91 0 (435): 0 1245--1253, 1996

1996

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

[2] [2]

An introduction to for machine learning

Andrieu, Christophe, De Freitas , Nando, Doucet, Arnaud, and Jordan, Michael I. An introduction to for machine learning . Machine Learning, 50 0 (1-2): 0 5--43, 2003

2003

[3] [3]

The entropic barrier: A simple and optimal universal self-concordant barrier

Bubeck, S \' e bastien and Eldan, Ronen. The entropic barrier: A simple and optimal universal self-concordant barrier . In Conference on Learning Theory, 2015

2015

[4] [4]

The algorithm: A joint approach to exact optimization and sampling

Dymetman, Marc, Bouchard, Guillaume, and Carter, Simon. The algorithm: A joint approach to exact optimization and sampling . Technical report, http://arxiv.org/abs/1207.0742, 2012

work page arXiv 2012

[5] [5]

An interruptible algorithm for perfect sampling via Markov chains

Fill, James Allen. An interruptible algorithm for perfect sampling via Markov chains . Annals of Applied Probability, 8 0 (1): 0 131--162, 1998

1998

[6] [6]

Gilks, W. R. Derivative-free adaptive rejection sampling for Gibbs sampling . Bayesian Statistics, 4, 1992

1992

[7] [7]

Gilks, W. R. and Wild, P. Adaptive rejection sampling for Gibbs sampling . Journal of the Royal Statistical Society. Series C (Applied Statistics), 41 0 (2): 0 337--348, 1992

1992

[8] [8]

R., Best, N

Gilks, W. R., Best, N. G., and Tan, K. K. C. Adaptive rejection metropolis sampling within Gibbs sampling . Journal of the Royal Statistical Society. Series C (Applied Statistics), 44 0 (4): 0 455--472, 1995

1995

[9] [9]

Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections

Gin \' e , Evarist and Nickl, Richard. Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections . Bernoulli, 16 0 (4): 0 1137--1163, 2010 a

2010

[10] [10]

Confidence bands in density estimation

Gin \' e , Evarist and Nickl, Richard. Confidence bands in density estimation . The Annals of Statistics, 38 0 (2): 0 1122--1170, 2010 b

2010

[11] [11]

Concave-Convex adaptive rejection sampling

G \" o r \" u r, Dilan and Teh, Yee Whye. Concave-Convex adaptive rejection sampling . Journal of Computational and Graphical Statistics, 2011

2011

[12] [12]

Canonical barriers on convex cones

Hildebrand, Roland. Canonical barriers on convex cones . Mathematics of Operations Research, 39 0 (3): 0 841--850, 2014

2014

[13] [13]

The asymptotic minimax constant for sup-norm loss in nonparametric density estimation

Korostelev, Alexander and Nussbaum, Michael. The asymptotic minimax constant for sup-norm loss in nonparametric density estimation . Bernoulli, 5 0 (6): 0 1099--1118, 1999

1999

[14] [14]

In Neural Information Processing Systems, 2014

Maddison, Chris J, Tarlow, Daniel, and Minka, Tom. . In Neural Information Processing Systems, 2014

2014

[15] [15]

A generalization of the adaptive rejection sampling algorithm

Martino, Luca and M \' i guez, Joaqu \' i n. A generalization of the adaptive rejection sampling algorithm . Statistics and Computing, 21 0 (4): 0 633--647, 2011

2011

[16] [16]

The Monte Carlo method

Metropolis, Nicholas and Ulam, S. The Monte Carlo method . Journal of the American Statistical Association, 44 0 (247): 0 335--341, 1949

1949

[17] [17]

Expectation propagation for approximate Bayesian inference

Minka, Tom. Expectation propagation for approximate Bayesian inference . In Uncertainty in Artificial Intelligence, 2001

2001

[18] [18]

Introductory lectures on convex optimization: A basic course

Nesterov, Yurii. Introductory lectures on convex optimization: A basic course . Kluwer Academic Publishers, 2004

2004

[19] [19]

Coupling from the past: A user's guide

Propp, James and Wilson, David. Coupling from the past: A user's guide . In Microsurveys in Discrete Probability, 1998

1998

[20] [20]

Tsybakov, A. B. Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes . Annals of Statistics, 26 0 (6): 0 2420--2469, 1998

1998

[21] [21]

Nonparametric importance sampling

Zhang, Ping. Nonparametric importance sampling . Journal of the American Statistical Association, 91 0 (435): 0 1245--1253, 1996

1996