Pliable rejection sampling
Pith reviewed 2026-05-08 09:52 UTC · model grok-4.3
The pith
Pliable rejection sampling learns a proposal with a kernel estimator to reduce rejections while preserving i.i.d. samples from the target.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pliable rejection sampling (PRS) constructs the proposal distribution using a kernel estimator. Since PRS builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples.
What carries the argument
The pliable rejection sampling (PRS) procedure, which adapts the proposal via kernel density estimation to lower the rejection rate.
If this is right
- Efficient sampling from arbitrary distributions becomes feasible without losing correctness.
- Computational resources can be allocated more predictably due to acceptance guarantees.
- Applications in Monte Carlo integration and Bayesian inference benefit from higher effective sample sizes.
- The approach works for a broader class of target distributions than non-adaptive methods.
Where Pith is reading between the lines
- Replacing the kernel estimator with more advanced density estimators could further improve performance.
- PRS might integrate with other sampling techniques like MCMC for hybrid methods.
- Empirical tests on high-dimensional or multimodal distributions would reveal practical limits.
- The method could inspire similar pliable adaptations in other sampling algorithms.
Load-bearing premise
That a kernel estimator can construct a proposal which lowers the rejection rate enough while keeping the samples i.i.d. from the target and satisfying the acceptance guarantees.
What would settle it
If experiments show that the accepted samples deviate significantly from the target distribution f according to a Kolmogorov-Smirnov test, or if the number of accepted samples falls below the guaranteed bound in repeated trials.
Figures
read the original abstract
Rejection sampling is a technique for sampling from difficult distributions. However, its use is limited due to a high rejection rate. Common adaptive rejection sampling methods either work only for very specific distributions or without performance guarantees. In this paper, we present pliable rejection sampling (PRS), a new approach to rejection sampling, where we learn the sampling proposal using a kernel estimator. Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces pliable rejection sampling (PRS), an adaptive variant of rejection sampling in which the proposal distribution is constructed via kernel density estimation on an initial sample from the target f. The central claims are that the resulting samples remain i.i.d. from f (with high probability) and that the method supplies an explicit guarantee on the number of accepted samples.
Significance. If the domination condition M q(x) >= f(x) is rigorously enforced and the acceptance-rate guarantee is non-vacuous, PRS would constitute a practical improvement over classical rejection sampling for distributions where a good proposal is otherwise hard to specify. The combination of kernel estimation with exact rejection sampling is a natural direction, but its value hinges on whether the exactness property survives the estimation step.
major comments (2)
- [Abstract] Abstract: The statement that 'samples obtained are with high probability i.i.d. and distributed according to f' presupposes that the kernel-based proposal q satisfies M q(x) >= f(x) everywhere for some finite M. No description is given of how the kernel estimator is modified (e.g., by adding a base density, using conservative uniform-convergence bounds, or truncation) to enforce this inequality. Standard KDE can underestimate f on sets of positive measure, in which case the rejection step either rejects everything or produces a distribution that is no longer exactly f.
- [Abstract] Abstract: The claimed 'guarantee on the number of accepted samples' is stated without reference to any theorem, any explicit bound, or the conditions under which the bound holds. Because the acceptance probability is 1/M and M is determined by the learned q, the guarantee is load-bearing for the performance claim and must be derived explicitly.
minor comments (1)
- [Abstract] Abstract: The adjective 'pliable' is introduced without definition or motivation.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the constructive comments on the abstract. We address each major point below and will revise the manuscript to improve precision and explicitness on these aspects.
read point-by-point responses
-
Referee: [Abstract] Abstract: The statement that 'samples obtained are with high probability i.i.d. and distributed according to f' presupposes that the kernel-based proposal q satisfies M q(x) >= f(x) everywhere for some finite M. No description is given of how the kernel estimator is modified (e.g., by adding a base density, using conservative uniform-convergence bounds, or truncation) to enforce this inequality. Standard KDE can underestimate f on sets of positive measure, in which case the rejection step either rejects everything or produces a distribution that is no longer exactly f.
Authors: We agree that the domination condition must be enforced for the i.i.d. property to hold exactly. In the full manuscript (Section 3), the proposal is constructed as a convex combination of the kernel density estimate and a fixed base density with positive mass everywhere; the mixing weight and the resulting M are chosen via a uniform convergence bound on the KDE (using the bandwidth and sample size) so that Mq(x) >= f(x) holds with probability at least 1-delta. This modification is what yields the high-probability exactness claim. We will revise the abstract to reference this construction and the supporting lemma. revision: yes
-
Referee: [Abstract] Abstract: The claimed 'guarantee on the number of accepted samples' is stated without reference to any theorem, any explicit bound, or the conditions under which the bound holds. Because the acceptance probability is 1/M and M is determined by the learned q, the guarantee is load-bearing for the performance claim and must be derived explicitly.
Authors: The guarantee appears as Theorem 4.1, which supplies an explicit high-probability lower bound on the acceptance probability (hence on the number of accepted samples after a fixed number of proposals) in terms of the kernel bandwidth, the initial sample size, and the smoothness parameters of f. The proof relies on the same uniform deviation bound used to enforce domination. We will update the abstract to cite this theorem and state the dependence of the bound on the problem parameters. revision: yes
Circularity Check
No circularity: PRS inherits i.i.d. property from external rejection sampling theorem
full rationale
The derivation chain consists of (1) constructing a proposal q via kernel density estimation from samples of the target f and (2) feeding q into the standard rejection sampling procedure. The i.i.d. correctness claim is explicitly justified by the sentence 'Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f,' which invokes the classical rejection sampling theorem (an external, non-self-referential result) rather than re-deriving or fitting it inside the paper. No equation equates a fitted quantity to a 'prediction' of itself, no self-citation is load-bearing for the central guarantee, and no ansatz is smuggled via prior work. The acceptance-rate guarantee is presented as a consequence of the learned proposal lowering the rejection rate while preserving the domination condition required by the external theorem. Because the core correctness step reduces to an independent, well-known result outside the paper's fitted components, the derivation is self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- kernel bandwidth or smoothing parameter
axioms (2)
- domain assumption The target distribution admits a density that can be evaluated pointwise.
- ad hoc to paper Kernel density estimation produces a usable proposal that maintains the rejection-sampling correctness properties.
Reference graph
Works this paper leans on
-
[1]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
An introduction to for machine learning
Andrieu, Christophe, De Freitas , Nando, Doucet, Arnaud, and Jordan, Michael I. An introduction to for machine learning . Machine Learning, 50 0 (1-2): 0 5--43, 2003
2003
-
[3]
The entropic barrier: A simple and optimal universal self-concordant barrier
Bubeck, S \' e bastien and Eldan, Ronen. The entropic barrier: A simple and optimal universal self-concordant barrier . In Conference on Learning Theory, 2015
2015
-
[4]
The algorithm: A joint approach to exact optimization and sampling
Dymetman, Marc, Bouchard, Guillaume, and Carter, Simon. The algorithm: A joint approach to exact optimization and sampling . Technical report, http://arxiv.org/abs/1207.0742, 2012
-
[5]
An interruptible algorithm for perfect sampling via Markov chains
Fill, James Allen. An interruptible algorithm for perfect sampling via Markov chains . Annals of Applied Probability, 8 0 (1): 0 131--162, 1998
1998
-
[6]
Gilks, W. R. Derivative-free adaptive rejection sampling for Gibbs sampling . Bayesian Statistics, 4, 1992
1992
-
[7]
Gilks, W. R. and Wild, P. Adaptive rejection sampling for Gibbs sampling . Journal of the Royal Statistical Society. Series C (Applied Statistics), 41 0 (2): 0 337--348, 1992
1992
-
[8]
R., Best, N
Gilks, W. R., Best, N. G., and Tan, K. K. C. Adaptive rejection metropolis sampling within Gibbs sampling . Journal of the Royal Statistical Society. Series C (Applied Statistics), 44 0 (4): 0 455--472, 1995
1995
-
[9]
Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections
Gin \' e , Evarist and Nickl, Richard. Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections . Bernoulli, 16 0 (4): 0 1137--1163, 2010 a
2010
-
[10]
Confidence bands in density estimation
Gin \' e , Evarist and Nickl, Richard. Confidence bands in density estimation . The Annals of Statistics, 38 0 (2): 0 1122--1170, 2010 b
2010
-
[11]
Concave-Convex adaptive rejection sampling
G \" o r \" u r, Dilan and Teh, Yee Whye. Concave-Convex adaptive rejection sampling . Journal of Computational and Graphical Statistics, 2011
2011
-
[12]
Canonical barriers on convex cones
Hildebrand, Roland. Canonical barriers on convex cones . Mathematics of Operations Research, 39 0 (3): 0 841--850, 2014
2014
-
[13]
The asymptotic minimax constant for sup-norm loss in nonparametric density estimation
Korostelev, Alexander and Nussbaum, Michael. The asymptotic minimax constant for sup-norm loss in nonparametric density estimation . Bernoulli, 5 0 (6): 0 1099--1118, 1999
1999
-
[14]
In Neural Information Processing Systems, 2014
Maddison, Chris J, Tarlow, Daniel, and Minka, Tom. . In Neural Information Processing Systems, 2014
2014
-
[15]
A generalization of the adaptive rejection sampling algorithm
Martino, Luca and M \' i guez, Joaqu \' i n. A generalization of the adaptive rejection sampling algorithm . Statistics and Computing, 21 0 (4): 0 633--647, 2011
2011
-
[16]
The Monte Carlo method
Metropolis, Nicholas and Ulam, S. The Monte Carlo method . Journal of the American Statistical Association, 44 0 (247): 0 335--341, 1949
1949
-
[17]
Expectation propagation for approximate Bayesian inference
Minka, Tom. Expectation propagation for approximate Bayesian inference . In Uncertainty in Artificial Intelligence, 2001
2001
-
[18]
Introductory lectures on convex optimization: A basic course
Nesterov, Yurii. Introductory lectures on convex optimization: A basic course . Kluwer Academic Publishers, 2004
2004
-
[19]
Coupling from the past: A user's guide
Propp, James and Wilson, David. Coupling from the past: A user's guide . In Microsurveys in Discrete Probability, 1998
1998
-
[20]
Tsybakov, A. B. Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes . Annals of Statistics, 26 0 (6): 0 2420--2469, 1998
1998
-
[21]
Nonparametric importance sampling
Zhang, Ping. Nonparametric importance sampling . Journal of the American Statistical Association, 91 0 (435): 0 1245--1253, 1996
1996
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.