Exact MCMC for Intractable Proposals

Dootika Vats; Dwija Kakkad

arxiv: 2410.10282 · v2 · submitted 2024-10-14 · 📊 stat.CO · stat.ME

Exact MCMC for Intractable Proposals

Dwija Kakkad , Dootika Vats This is my paper

Pith reviewed 2026-05-23 19:25 UTC · model grok-4.3

classification 📊 stat.CO stat.ME

keywords MCMCBernoulli factoryintractable proposalsMetropolis-Hastingsexact samplingBayesian inferenceaccept-reject algorithm

0 comments

The pith

Adapting Bernoulli factory methods yields exact MCMC sampling for proposals with unknown normalizing constants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that Bernoulli factory techniques, designed for doubly intractable targets, adapt naturally to make the Metropolis-Hastings accept-reject step exact even when the proposal distribution's normalizing constant is unknown. Instead of approximating the ratio or developing case-specific fixes, the approach constructs a Bernoulli factory that simulates the exact decision. Readers interested in Bayesian computation would care because it removes a common source of approximation error in MCMC without restricting proposal choice. The paper demonstrates this with three examples spanning relevant applications.

Core claim

We show how Bernoulli factory MCMC algorithms, originally proposed for doubly intractable target distributions, can quite naturally be adapted to yield an exact MCMC sampling method for proposals with unknown normalizing constants. This is achieved by expressing the acceptance probability ratio in a form suitable for Bernoulli factory construction, allowing exact simulation of the accept/reject decision.

What carries the argument

A Bernoulli factory constructed for the acceptance probability ratio in the Metropolis-Hastings algorithm when the proposal has an intractable normalizing constant

If this is right

Provides an exact alternative to approximate MCMC methods for intractable proposals.
Enables use of a broader class of proposal distributions in Bayesian inference.
Demonstrated through three diverse examples showing practical applicability.
Eliminates the need for specialized algorithms to handle each type of intractable proposal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique might generalize to other accept-reject based samplers beyond standard MCMC.
It could lead to more flexible proposal designs in complex models where normalizing constants are hard to compute.
Efficiency comparisons with approximation methods would be a natural next step for implementation.

Load-bearing premise

The acceptance probability ratio involving the intractable proposal normalizing constant can be expressed or bounded in a form that permits an exact Bernoulli factory.

What would settle it

Finding a target and proposal pair where the ratio cannot be bounded or expressed suitably for any Bernoulli factory, causing the method to fail in producing exact samples.

Figures

Figures reproduced from arXiv: 2410.10282 by Dootika Vats, Dwija Kakkad.

**Figure 2.** Figure 2: Sensor network localiztion: Scatterplots using auxiliary variable method (left) [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 3.** Figure 3: Cox: Estimated density plots for the 100th component using both methods. [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

read the original abstract

Accept-reject based Markov chain Monte Carlo (MCMC) methods are the workhorse algorithm for Bayesian inference. These algorithms, like Metropolis-Hastings, require choosing a proposal distribution which is typically informed by the desired target distribution. Surprisingly, proposal distributions with unknown normalizing constants are not uncommon, even though for such a choice of a proposal, the Metropolis-Hastings acceptance ratio cannot be evaluated exactly. Across the literature, authors resort to approximation methods that yield inexact MCMC or develop specialized algorithms to combat this problem. We show how Bernoulli factory MCMC algorithms, originally proposed for doubly intractable target distributions, can quite naturally be adapted to yield an exact MCMC sampling method. We present three diverse and relevant examples demonstrating the usefulness of the Bernoulli factory approach to this problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts Bernoulli factory MCMC to proposals with intractable normalizing constants and backs it with three examples.

read the letter

The main point is that this work takes Bernoulli factory methods previously used for doubly intractable targets and applies them to get exact Metropolis-Hastings when the proposal itself has an unknown normalizing constant. The abstract frames this as a natural adaptation, and the stress-test note finds no internal inconsistency in that claim. Three examples are given to show the approach on different problems. That is the actual new piece: shifting the factory from the target to the proposal side. The examples appear relevant and help illustrate when the acceptance ratio involving the unknown Z can be turned into a factory. The paper does a reasonable job of positioning the method against the common workaround of approximations or specialized algorithms. The central exactness claim follows if the ratio admits a Bernoulli factory representation, which the abstract states it does in the cases considered. A soft spot is that Bernoulli factories often require many evaluations and can have high expected runtime; the paper would need to show that the examples stay practical rather than just theoretically exact. If the constructions in the examples are explicit and the efficiency is addressed, that concern stays minor. The work is aimed at computational statisticians who run into intractable proposals in Bayesian models and want an exact alternative. A reader already familiar with Bernoulli factories or exact MCMC would get the most out of it. The paper deserves a serious referee because it offers a general exact route with concrete demonstrations rather than just another approximation.

Referee Report

2 major / 2 minor

Summary. The paper claims that Bernoulli factory MCMC methods, developed for doubly intractable targets, can be adapted to Metropolis-Hastings proposals whose normalizing constants are intractable, thereby producing an exact MCMC algorithm without approximation. The adaptation is illustrated through three diverse examples.

Significance. If the adaptation is rigorously justified, the result would be useful because it supplies an exact sampling route for a practically relevant class of proposals where authors currently rely on approximations or specialized constructions. The reuse of existing Bernoulli factory machinery is a strength when the required acceptance probability admits an exact factory representation.

major comments (2)

[Abstract and §3 (examples)] The central claim rests on the assertion that the acceptance ratio (including the ratio of intractable proposal normalizers Z(x)/Z(x')) can be expressed in a form that admits an exact Bernoulli factory. The manuscript must explicitly state the conditions on the proposal family under which this representation exists and verify it for each of the three examples; otherwise the exactness guarantee is conditional rather than general.
[§2 (method)] It is unclear whether the Bernoulli factory construction preserves the correct stationary distribution when the factory is only an unbiased estimator of the acceptance probability rather than an exact indicator. A short derivation showing that the overall transition kernel still satisfies detailed balance would strengthen the exactness claim.

minor comments (2)

Notation for the intractable proposal normalizer should be introduced once and used consistently; the abstract uses Z(x) while later sections appear to switch between Z and other symbols.
The three examples would benefit from a short table comparing wall-clock time or effective sample size against a standard approximate alternative (e.g., pseudo-marginal or noisy MH) to quantify the practical overhead of the exact factory.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful report and the opportunity to clarify the exactness guarantees in our work. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Abstract and §3 (examples)] The central claim rests on the assertion that the acceptance ratio (including the ratio of intractable proposal normalizers Z(x)/Z(x')) can be expressed in a form that admits an exact Bernoulli factory. The manuscript must explicitly state the conditions on the proposal family under which this representation exists and verify it for each of the three examples; otherwise the exactness guarantee is conditional rather than general.

Authors: We agree that the conditions should be stated explicitly to make the exactness claim unconditional within the relevant class. In the revision we will add a short subsection to §2 that specifies the general conditions on the proposal family (namely, that the ratio of normalizers Z(x)/Z(x') together with the target ratio must admit a Bernoulli factory representation, which holds when the ratio can be written as a product or ratio of functions that are pointwise computable and bounded in [0,1] after suitable scaling). We will then verify these conditions explicitly for each of the three examples in §3, confirming that the required factories exist and can be implemented without approximation. revision: yes
Referee: [§2 (method)] It is unclear whether the Bernoulli factory construction preserves the correct stationary distribution when the factory is only an unbiased estimator of the acceptance probability rather than an exact indicator. A short derivation showing that the overall transition kernel still satisfies detailed balance would strengthen the exactness claim.

Authors: We will add the requested derivation in §2. The key point is that a Bernoulli factory, when it terminates, returns an exact Bernoulli random variable whose success probability equals the target acceptance probability p; it is not merely an unbiased estimator of p but an exact sampler from Bernoulli(p). Consequently the transition kernel is identical to the standard Metropolis-Hastings kernel that uses this p, and detailed balance follows from the usual argument once the factory has produced its exact 0/1 outcome. The derivation will be kept concise and will emphasize that no approximation enters the kernel. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an adaptation of existing Bernoulli factory MCMC methods (originally for doubly intractable targets) to proposals with intractable normalizing constants. The central claim of exact MCMC follows from the external properties of Bernoulli factories applied to the acceptance ratio, with three examples given. No derivation reduces by construction to fitted inputs, self-definitions, or self-citation chains; the approach is an extension rather than a self-referential result. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review identifies no free parameters, axioms, or invented entities; the contribution is described as an algorithmic adaptation of prior Bernoulli factory methods.

pith-pipeline@v0.9.0 · 5648 in / 1062 out tokens · 30662 ms · 2026-05-23T19:25:28.129114+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Adams, R., Murray, I., and Mackay, D. (2009). Tractable nonparametric bayesian inference in poisson processes with gaussian process intensities. Proceedings of the 26th International Conference on Machine Learning , 382:2

work page 2009
[2]

Agrawal, S., Vats, D., atuszy \'n ski, K., and Roberts, G. O. (2023). Optimal scaling of MCMC beyond M etropolis. Advances in Applied Probability , 55(2):492--509

work page 2023
[3]

Barker, A. A. (1965). Monte C arlo calculations of the radial distribution functions for a proton? electron plasma. Australian Journal of Physics , 18(2):119--134

work page 1965
[4]

Botev, Z. I. (2016). The normal law under linear restrictions: Simulation and estimation via minimax tilting. Journal of the Royal Statistical Society Series B: Statistical Methodology , 79(1):125–148

work page 2016
[5]

and Campbell, T

Chen, N. and Campbell, T. (2024). Coreset markov chain M onte C arlo. In International Conference on Artificial Intelligence and Statistics , pages 4438--4446. PMLR

work page 2024
[6]

Cody, W. J. (1969). Rational C hebyshev approximations for the error function. Mathematics of Computation , 23(107):631--637

work page 1969
[7]

Dunson, D. B. and Johndrow, J. E. (2020). The H astings algorithm at fifty. Biometrika , 107(1):1--23

work page 2020
[8]

M., Hughes, J., Vats, D., Dai, N., Gupta, K., and Maji, U

Flegal, J. M., Hughes, J., Vats, D., Dai, N., Gupta, K., and Maji, U. (2021). mcmcse: Monte Carlo Standard Errors for MCMC . Riverside, CA, and Kanpur, India. R package version 1.5-0

work page 2021
[9]

B., Łatuszyński, K., and Roberts, G

Gonçalves, F. B., Łatuszyński, K., and Roberts, G. O. (2017). Barker's algorithm for Bayesian inference with intractable likelihoods

work page 2017
[10]

and Berliner, L

Herbei, R. and Berliner, L. M. (2014). Estimating ocean circulation: an MCMC approach with approximated likelihoods via the B ernoulli factory. Journal of the American Statistical Association , 109(507):944--954

work page 2014
[11]

Ihler, A., Fisher, J., Moses, R., and Willsky, A. (2005). Nonparametric belief propagation for self-localization of sensor networks. IEEE Journal on Selected Areas in Communications , 23(4):809--819

work page 2005
[12]

and O'Brien, G

Keane, M. and O'Brien, G. L. (1994). A B ernoulli factory. ACM Transactions on Modeling and Computer Simulation (TOMACS) , 4(2):213--219

work page 1994
[13]

atuszy \'n ski, K., Kosmidis, I., Papaspiliopoulos, O., and Roberts, G. O. (2011). Simulating events of unknown probabilities via reverse time M artingales. Random Structures & Algorithms , 38(4):441--452

work page 2011
[14]

and Roberts, G

atuszy \'n ski, K. and Roberts, G. O. (2013). Clts and asymptotic variance of time-sampled M arkov chains. Methodology and Computing in Applied Probability , 15(1):237--247

work page 2013
[15]

and Zanella, G

Livingstone, S. and Zanella, G. (2022). The B arker proposal: combining robustness and efficiency in gradient-based MCMC . Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(2):496--523

work page 2022
[16]

L \'o pez-Lopera, A. F. (2018). lineqgpr: Gaussian process regression models with linear inequality constraints. R package version 0.0 , 3

work page 2018
[17]

F., John, S., and Durrande, N

Lopez-Lopera, A. F., John, S., and Durrande, N. (2019). Gaussian process modulated cox processes under linear inequality constraints. In Chaudhuri, K. and Sugiyama, M., editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , volume 89 of Proceedings of Machine Learning Research , pages 1997--2006. PMLR

work page 2019
[18]

M ller, J., Pettitt, A., Reeves, R., and Berthelsen, K. (2006). An efficient markov chain monte carlo method for distributions with intractable normalising constants. Biometrika , 93(2):451--458

work page 2006
[19]

Neal, R. M. (2011). MCMC Using H amiltonian Dynamics . Chapman and Hall/CRC

work page 2011
[20]

Peskun, P. H. (1973). Optimum M onte- C arlo sampling using M arkov chains. Biometrika , 60(3):607--612

work page 1973
[21]

Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various M etropolis- H astings algorithms. Statistical science , 16(4):351--367

work page 2001
[22]

Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of L angevin distributions and their discrete approximations. Bernoulli , 2(4):341--363

work page 1996
[23]

and Ni, Y

Sarkar, B. and Ni, Y. (2024). Mr. rgm: An r package for fitting B ayesian multivariate bidirectional mendelian randomization networks. arXiv preprint arXiv:2403.03944

work page arXiv 2024
[24]

Tak, H., Meng, X.-L., and van Dyk, D. A. (2018). A repelling–attracting metropolis algorithm for multimodality. Journal of Computational and Graphical Statistics , 27(3):479–490

work page 2018
[25]

M., and Jones, G

Vats, D., Flegal, J. M., and Jones, G. L. (2019). Multivariate output analysis for M arkov chain M onte C arlo. Biometrika , 106(2):321--337

work page 2019
[26]

B., atuszy \'n ski, K., and Roberts, G

Vats, D., Gon c alves, F. B., atuszy \'n ski, K., and Roberts, G. O. (2022). Efficient bernoulli factory M arkov chain M onte C arlo for intractable posteriors. Biometrika , 109(2):369--385

work page 2022

[1] [1]

Adams, R., Murray, I., and Mackay, D. (2009). Tractable nonparametric bayesian inference in poisson processes with gaussian process intensities. Proceedings of the 26th International Conference on Machine Learning , 382:2

work page 2009

[2] [2]

Agrawal, S., Vats, D., atuszy \'n ski, K., and Roberts, G. O. (2023). Optimal scaling of MCMC beyond M etropolis. Advances in Applied Probability , 55(2):492--509

work page 2023

[3] [3]

Barker, A. A. (1965). Monte C arlo calculations of the radial distribution functions for a proton? electron plasma. Australian Journal of Physics , 18(2):119--134

work page 1965

[4] [4]

Botev, Z. I. (2016). The normal law under linear restrictions: Simulation and estimation via minimax tilting. Journal of the Royal Statistical Society Series B: Statistical Methodology , 79(1):125–148

work page 2016

[5] [5]

and Campbell, T

Chen, N. and Campbell, T. (2024). Coreset markov chain M onte C arlo. In International Conference on Artificial Intelligence and Statistics , pages 4438--4446. PMLR

work page 2024

[6] [6]

Cody, W. J. (1969). Rational C hebyshev approximations for the error function. Mathematics of Computation , 23(107):631--637

work page 1969

[7] [7]

Dunson, D. B. and Johndrow, J. E. (2020). The H astings algorithm at fifty. Biometrika , 107(1):1--23

work page 2020

[8] [8]

M., Hughes, J., Vats, D., Dai, N., Gupta, K., and Maji, U

Flegal, J. M., Hughes, J., Vats, D., Dai, N., Gupta, K., and Maji, U. (2021). mcmcse: Monte Carlo Standard Errors for MCMC . Riverside, CA, and Kanpur, India. R package version 1.5-0

work page 2021

[9] [9]

B., Łatuszyński, K., and Roberts, G

Gonçalves, F. B., Łatuszyński, K., and Roberts, G. O. (2017). Barker's algorithm for Bayesian inference with intractable likelihoods

work page 2017

[10] [10]

and Berliner, L

Herbei, R. and Berliner, L. M. (2014). Estimating ocean circulation: an MCMC approach with approximated likelihoods via the B ernoulli factory. Journal of the American Statistical Association , 109(507):944--954

work page 2014

[11] [11]

Ihler, A., Fisher, J., Moses, R., and Willsky, A. (2005). Nonparametric belief propagation for self-localization of sensor networks. IEEE Journal on Selected Areas in Communications , 23(4):809--819

work page 2005

[12] [12]

and O'Brien, G

Keane, M. and O'Brien, G. L. (1994). A B ernoulli factory. ACM Transactions on Modeling and Computer Simulation (TOMACS) , 4(2):213--219

work page 1994

[13] [13]

atuszy \'n ski, K., Kosmidis, I., Papaspiliopoulos, O., and Roberts, G. O. (2011). Simulating events of unknown probabilities via reverse time M artingales. Random Structures & Algorithms , 38(4):441--452

work page 2011

[14] [14]

and Roberts, G

atuszy \'n ski, K. and Roberts, G. O. (2013). Clts and asymptotic variance of time-sampled M arkov chains. Methodology and Computing in Applied Probability , 15(1):237--247

work page 2013

[15] [15]

and Zanella, G

Livingstone, S. and Zanella, G. (2022). The B arker proposal: combining robustness and efficiency in gradient-based MCMC . Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(2):496--523

work page 2022

[16] [16]

L \'o pez-Lopera, A. F. (2018). lineqgpr: Gaussian process regression models with linear inequality constraints. R package version 0.0 , 3

work page 2018

[17] [17]

F., John, S., and Durrande, N

Lopez-Lopera, A. F., John, S., and Durrande, N. (2019). Gaussian process modulated cox processes under linear inequality constraints. In Chaudhuri, K. and Sugiyama, M., editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics , volume 89 of Proceedings of Machine Learning Research , pages 1997--2006. PMLR

work page 2019

[18] [18]

M ller, J., Pettitt, A., Reeves, R., and Berthelsen, K. (2006). An efficient markov chain monte carlo method for distributions with intractable normalising constants. Biometrika , 93(2):451--458

work page 2006

[19] [19]

Neal, R. M. (2011). MCMC Using H amiltonian Dynamics . Chapman and Hall/CRC

work page 2011

[20] [20]

Peskun, P. H. (1973). Optimum M onte- C arlo sampling using M arkov chains. Biometrika , 60(3):607--612

work page 1973

[21] [21]

Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various M etropolis- H astings algorithms. Statistical science , 16(4):351--367

work page 2001

[22] [22]

Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of L angevin distributions and their discrete approximations. Bernoulli , 2(4):341--363

work page 1996

[23] [23]

and Ni, Y

Sarkar, B. and Ni, Y. (2024). Mr. rgm: An r package for fitting B ayesian multivariate bidirectional mendelian randomization networks. arXiv preprint arXiv:2403.03944

work page arXiv 2024

[24] [24]

Tak, H., Meng, X.-L., and van Dyk, D. A. (2018). A repelling–attracting metropolis algorithm for multimodality. Journal of Computational and Graphical Statistics , 27(3):479–490

work page 2018

[25] [25]

M., and Jones, G

Vats, D., Flegal, J. M., and Jones, G. L. (2019). Multivariate output analysis for M arkov chain M onte C arlo. Biometrika , 106(2):321--337

work page 2019

[26] [26]

B., atuszy \'n ski, K., and Roberts, G

Vats, D., Gon c alves, F. B., atuszy \'n ski, K., and Roberts, G. O. (2022). Efficient bernoulli factory M arkov chain M onte C arlo for intractable posteriors. Biometrika , 109(2):369--385

work page 2022