pith. sign in

arxiv: 2604.22385 · v1 · submitted 2026-04-24 · 📊 stat.ML · cs.LG

Pliable rejection sampling

Pith reviewed 2026-05-08 09:52 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords rejection samplingkernel density estimationadaptive samplingi.i.d. samplesproposal distributionMonte Carlo methodsdensity estimation
0
0 comments X

The pith

Pliable rejection sampling learns a proposal with a kernel estimator to reduce rejections while preserving i.i.d. samples from the target.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces pliable rejection sampling (PRS) to address the high rejection rates that limit standard rejection sampling. By learning the proposal distribution using a kernel estimator, PRS aims to make sampling more efficient. The key is that this learning does not compromise the guarantee that accepted samples are independent and identically distributed according to the target distribution f. This matters because it extends the usability of rejection sampling to more complex distributions in statistics and machine learning while maintaining theoretical correctness and providing bounds on accepted samples.

Core claim

Pliable rejection sampling (PRS) constructs the proposal distribution using a kernel estimator. Since PRS builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples.

What carries the argument

The pliable rejection sampling (PRS) procedure, which adapts the proposal via kernel density estimation to lower the rejection rate.

If this is right

  • Efficient sampling from arbitrary distributions becomes feasible without losing correctness.
  • Computational resources can be allocated more predictably due to acceptance guarantees.
  • Applications in Monte Carlo integration and Bayesian inference benefit from higher effective sample sizes.
  • The approach works for a broader class of target distributions than non-adaptive methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Replacing the kernel estimator with more advanced density estimators could further improve performance.
  • PRS might integrate with other sampling techniques like MCMC for hybrid methods.
  • Empirical tests on high-dimensional or multimodal distributions would reveal practical limits.
  • The method could inspire similar pliable adaptations in other sampling algorithms.

Load-bearing premise

That a kernel estimator can construct a proposal which lowers the rejection rate enough while keeping the samples i.i.d. from the target and satisfying the acceptance guarantees.

What would settle it

If experiments show that the accepted samples deviate significantly from the target distribution f according to a Kolmogorov-Smirnov test, or if the number of accepted samples falls below the guaranteed bound in repeated trials.

Figures

Figures reproduced from arXiv: 2604.22385 by Akram Erraqabi, Alexandra Carpentier, Michal Valko, Odalric-Ambrym Maillard.

Figure 1
Figure 1. Figure 1: Simple rejection sampling with a high rejection rate? The reason is that for every point proposed, we need to call f to decide whether this point is accepted. If many points are rejected, then f is called many times with few generated samples. When evaluating f is costly, then we are wasting resources. To alleviate this problem, adaptive rejection sam￾plers (Gilks, 1992; Gilks & Wild, 1992; Martino & M´ıgu… view at source ↗
Figure 2
Figure 2. Figure 2: Pliable rejection sampling 3.1. Uniform bounds for kernel regression estimation Let X1, . . . , XN be N points generated uniformly on [0, A] d . Let us define h def =hs(δ) = (log(NA/δ)/N) 1 2s+1 , fb(x) = Ad Nhd X N k=1 f(Xi)K  Xi − x h  . (1) Theorem 1 (proved in Appendix A). Assume that Assump￾tions 1 and 2 hold with 0 < s ≤ 2, C, C′ , C′′, c, c′′ > 0, and ε > 0. The estimate fb is such that with proba… view at source ↗
Figure 3
Figure 3. Figure 3: Left, center: Acceptance vs. peakiness. Right: 2D target (orange) and the pliable proposal (blue) n = 106 acceptance rate standard deviation PRS 66.4% 0.45% A ⋆ sampling 76.1% 0.80% SRS 25.0% 0.01% view at source ↗
read the original abstract

Rejection sampling is a technique for sampling from difficult distributions. However, its use is limited due to a high rejection rate. Common adaptive rejection sampling methods either work only for very specific distributions or without performance guarantees. In this paper, we present pliable rejection sampling (PRS), a new approach to rejection sampling, where we learn the sampling proposal using a kernel estimator. Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces pliable rejection sampling (PRS), an adaptive variant of rejection sampling in which the proposal distribution is constructed via kernel density estimation on an initial sample from the target f. The central claims are that the resulting samples remain i.i.d. from f (with high probability) and that the method supplies an explicit guarantee on the number of accepted samples.

Significance. If the domination condition M q(x) >= f(x) is rigorously enforced and the acceptance-rate guarantee is non-vacuous, PRS would constitute a practical improvement over classical rejection sampling for distributions where a good proposal is otherwise hard to specify. The combination of kernel estimation with exact rejection sampling is a natural direction, but its value hinges on whether the exactness property survives the estimation step.

major comments (2)
  1. [Abstract] Abstract: The statement that 'samples obtained are with high probability i.i.d. and distributed according to f' presupposes that the kernel-based proposal q satisfies M q(x) >= f(x) everywhere for some finite M. No description is given of how the kernel estimator is modified (e.g., by adding a base density, using conservative uniform-convergence bounds, or truncation) to enforce this inequality. Standard KDE can underestimate f on sets of positive measure, in which case the rejection step either rejects everything or produces a distribution that is no longer exactly f.
  2. [Abstract] Abstract: The claimed 'guarantee on the number of accepted samples' is stated without reference to any theorem, any explicit bound, or the conditions under which the bound holds. Because the acceptance probability is 1/M and M is determined by the learned q, the guarantee is load-bearing for the performance claim and must be derived explicitly.
minor comments (1)
  1. [Abstract] Abstract: The adjective 'pliable' is introduced without definition or motivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the constructive comments on the abstract. We address each major point below and will revise the manuscript to improve precision and explicitness on these aspects.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The statement that 'samples obtained are with high probability i.i.d. and distributed according to f' presupposes that the kernel-based proposal q satisfies M q(x) >= f(x) everywhere for some finite M. No description is given of how the kernel estimator is modified (e.g., by adding a base density, using conservative uniform-convergence bounds, or truncation) to enforce this inequality. Standard KDE can underestimate f on sets of positive measure, in which case the rejection step either rejects everything or produces a distribution that is no longer exactly f.

    Authors: We agree that the domination condition must be enforced for the i.i.d. property to hold exactly. In the full manuscript (Section 3), the proposal is constructed as a convex combination of the kernel density estimate and a fixed base density with positive mass everywhere; the mixing weight and the resulting M are chosen via a uniform convergence bound on the KDE (using the bandwidth and sample size) so that Mq(x) >= f(x) holds with probability at least 1-delta. This modification is what yields the high-probability exactness claim. We will revise the abstract to reference this construction and the supporting lemma. revision: yes

  2. Referee: [Abstract] Abstract: The claimed 'guarantee on the number of accepted samples' is stated without reference to any theorem, any explicit bound, or the conditions under which the bound holds. Because the acceptance probability is 1/M and M is determined by the learned q, the guarantee is load-bearing for the performance claim and must be derived explicitly.

    Authors: The guarantee appears as Theorem 4.1, which supplies an explicit high-probability lower bound on the acceptance probability (hence on the number of accepted samples after a fixed number of proposals) in terms of the kernel bandwidth, the initial sample size, and the smoothness parameters of f. The proof relies on the same uniform deviation bound used to enforce domination. We will update the abstract to cite this theorem and state the dependence of the bound on the problem parameters. revision: yes

Circularity Check

0 steps flagged

No circularity: PRS inherits i.i.d. property from external rejection sampling theorem

full rationale

The derivation chain consists of (1) constructing a proposal q via kernel density estimation from samples of the target f and (2) feeding q into the standard rejection sampling procedure. The i.i.d. correctness claim is explicitly justified by the sentence 'Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f,' which invokes the classical rejection sampling theorem (an external, non-self-referential result) rather than re-deriving or fitting it inside the paper. No equation equates a fitted quantity to a 'prediction' of itself, no self-citation is load-bearing for the central guarantee, and no ansatz is smuggled via prior work. The acceptance-rate guarantee is presented as a consequence of the learned proposal lowering the rejection rate while preserving the domination condition required by the external theorem. Because the core correctness step reduces to an independent, well-known result outside the paper's fitted components, the derivation is self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method inherits standard rejection-sampling requirements and adds kernel density estimation, which itself depends on a smoothing parameter whose selection is not detailed in the abstract.

free parameters (1)
  • kernel bandwidth or smoothing parameter
    Kernel estimators require a bandwidth choice that affects proposal quality; its selection rule is not specified in the abstract.
axioms (2)
  • domain assumption The target distribution admits a density that can be evaluated pointwise.
    Required for the acceptance probability in rejection sampling.
  • ad hoc to paper Kernel density estimation produces a usable proposal that maintains the rejection-sampling correctness properties.
    Central assumption enabling the pliable adaptation.

pith-pipeline@v0.9.0 · 5389 in / 1119 out tokens · 31864 ms · 2026-05-08T09:52:57.209970+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    An introduction to for machine learning

    Andrieu, Christophe, De Freitas , Nando, Doucet, Arnaud, and Jordan, Michael I. An introduction to for machine learning . Machine Learning, 50 0 (1-2): 0 5--43, 2003

  3. [3]

    The entropic barrier: A simple and optimal universal self-concordant barrier

    Bubeck, S \' e bastien and Eldan, Ronen. The entropic barrier: A simple and optimal universal self-concordant barrier . In Conference on Learning Theory, 2015

  4. [4]

    The algorithm: A joint approach to exact optimization and sampling

    Dymetman, Marc, Bouchard, Guillaume, and Carter, Simon. The algorithm: A joint approach to exact optimization and sampling . Technical report, http://arxiv.org/abs/1207.0742, 2012

  5. [5]

    An interruptible algorithm for perfect sampling via Markov chains

    Fill, James Allen. An interruptible algorithm for perfect sampling via Markov chains . Annals of Applied Probability, 8 0 (1): 0 131--162, 1998

  6. [6]

    Gilks, W. R. Derivative-free adaptive rejection sampling for Gibbs sampling . Bayesian Statistics, 4, 1992

  7. [7]

    Gilks, W. R. and Wild, P. Adaptive rejection sampling for Gibbs sampling . Journal of the Royal Statistical Society. Series C (Applied Statistics), 41 0 (2): 0 337--348, 1992

  8. [8]

    R., Best, N

    Gilks, W. R., Best, N. G., and Tan, K. K. C. Adaptive rejection metropolis sampling within Gibbs sampling . Journal of the Royal Statistical Society. Series C (Applied Statistics), 44 0 (4): 0 455--472, 1995

  9. [9]

    Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections

    Gin \' e , Evarist and Nickl, Richard. Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections . Bernoulli, 16 0 (4): 0 1137--1163, 2010 a

  10. [10]

    Confidence bands in density estimation

    Gin \' e , Evarist and Nickl, Richard. Confidence bands in density estimation . The Annals of Statistics, 38 0 (2): 0 1122--1170, 2010 b

  11. [11]

    Concave-Convex adaptive rejection sampling

    G \" o r \" u r, Dilan and Teh, Yee Whye. Concave-Convex adaptive rejection sampling . Journal of Computational and Graphical Statistics, 2011

  12. [12]

    Canonical barriers on convex cones

    Hildebrand, Roland. Canonical barriers on convex cones . Mathematics of Operations Research, 39 0 (3): 0 841--850, 2014

  13. [13]

    The asymptotic minimax constant for sup-norm loss in nonparametric density estimation

    Korostelev, Alexander and Nussbaum, Michael. The asymptotic minimax constant for sup-norm loss in nonparametric density estimation . Bernoulli, 5 0 (6): 0 1099--1118, 1999

  14. [14]

    In Neural Information Processing Systems, 2014

    Maddison, Chris J, Tarlow, Daniel, and Minka, Tom. . In Neural Information Processing Systems, 2014

  15. [15]

    A generalization of the adaptive rejection sampling algorithm

    Martino, Luca and M \' i guez, Joaqu \' i n. A generalization of the adaptive rejection sampling algorithm . Statistics and Computing, 21 0 (4): 0 633--647, 2011

  16. [16]

    The Monte Carlo method

    Metropolis, Nicholas and Ulam, S. The Monte Carlo method . Journal of the American Statistical Association, 44 0 (247): 0 335--341, 1949

  17. [17]

    Expectation propagation for approximate Bayesian inference

    Minka, Tom. Expectation propagation for approximate Bayesian inference . In Uncertainty in Artificial Intelligence, 2001

  18. [18]

    Introductory lectures on convex optimization: A basic course

    Nesterov, Yurii. Introductory lectures on convex optimization: A basic course . Kluwer Academic Publishers, 2004

  19. [19]

    Coupling from the past: A user's guide

    Propp, James and Wilson, David. Coupling from the past: A user's guide . In Microsurveys in Discrete Probability, 1998

  20. [20]

    Tsybakov, A. B. Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes . Annals of Statistics, 26 0 (6): 0 2420--2469, 1998

  21. [21]

    Nonparametric importance sampling

    Zhang, Ping. Nonparametric importance sampling . Journal of the American Statistical Association, 91 0 (435): 0 1245--1253, 1996