pith. sign in

arxiv: 2605.21651 · v1 · pith:XJE36F2Rnew · submitted 2026-05-20 · 📊 stat.ME · stat.CO

Similarity-Driven Proposals for MCMC Algorithms on Discrete Spaces

Pith reviewed 2026-05-22 08:40 UTC · model grok-4.3

classification 📊 stat.ME stat.CO
keywords MCMCdiscrete spacessimilarity-driven proposalshierarchical modelslatent variablesDirichlet-Multinomial regressionposterior sampling
0
0 comments X

The pith

Similarity-driven proposals guide MCMC sampling on discrete spaces by favoring states that match data according to a discrepancy measure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops MCMC algorithms that use similarity-driven proposals to sample posterior distributions supported on discrete state spaces. These proposals direct the chain toward states that the data favor, measured by a discrepancy between observations and the candidate model. The construction works for hierarchical models that contain both discrete variables and extra latent variables, without any need to integrate the latents out of the posterior. The same framework is shown to apply to a Dirichlet-Multinomial regression example on real data.

Core claim

The paper introduces MCMC algorithms whose proposals are driven by a data-based measure of similarity between observations and the model. This mechanism produces valid transitions that concentrate on high-posterior states and extends without modification to hierarchical specifications that include both discrete parameters and additional latent components.

What carries the argument

Similarity-driven proposal that uses a data-driven discrepancy measure to bias the next state toward regions favored by the posterior.

If this is right

  • Hierarchical models mixing discrete variables with latent components can be sampled directly without marginalization.
  • The same proposal construction applies to regression settings such as Dirichlet-Multinomial models.
  • Simulation studies confirm that the resulting chains remain valid while targeting the intended posterior.
  • Real-data examples demonstrate practical use on models that previous discrete MCMC methods could not handle without integration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The discrepancy measure could be replaced by other data-driven scores, potentially improving performance in specific application domains.
  • The approach may reduce the computational burden of repeated marginalization steps in large hierarchical models.
  • Similar data-similarity ideas might transfer to other discrete or combinatorial sampling problems outside the hierarchical setting shown here.

Load-bearing premise

A data-driven discrepancy measure between observations and a proposed model can steer the chain toward high-posterior states without introducing bias or poor mixing.

What would settle it

A small discrete model whose exact posterior is known by enumeration, run with the new proposals, yields an empirical distribution that visibly differs from the true posterior.

Figures

Figures reproduced from arXiv: 2605.21651 by Alexandros Beskos, Luca Aiello, Maria De Iorio, Raffaele Argiento.

Figure 1
Figure 1. Figure 1: Acceptance rates for various choices of λ using the F-test proposal. To illustrate the effectiveness of the proposed adaptive tuning scheme, we conduct a controlled experiment on the simulated dataset described above. The goal is not to opti￾mize predictive performance per se, but to verify that the adaptation procedure successfully converges to a value of λ that maximizes the acceptance rate. In this expe… view at source ↗
Figure 2
Figure 2. Figure 2: Lambda (left) and acceptance rate (right) evolution across iteration with the adaptive scheme. highlighting similarities and differences in their tail behavior and robustness. 3.1 Jump distances with local-move proposal To account for dependencies among predictors, we introduce an optional local-move step. At each iteration, an active predictor with at least one inactive neighbor, identified via a correlat… view at source ↗
Figure 3
Figure 3. Figure 3: Empirical distribution of Hamming distances between consecutive MCMC itera￾tions when including the local-move step across 100 independent runs for the F-test proposal (left) and LR-test proposal (right). where yi+ = PJ j=1 yij , and ϕi is defined on the (J − 1)-dimensional simplex S J−1 = n (ϕi1, . . . , ϕiJ ) : ϕij ≥ 0, X J j=1 ϕij = 1o Imposing a conjugate Dirichlet prior on ϕi , ϕi | γi ∼ Dirichlet(γi)… view at source ↗
Figure 4
Figure 4. Figure 4: Posterior inclusion probabilities (ξbpj ). Red dots indicate associations with PIP > 0.76, corresponding to Bayesian FDR of 0.05. genus, and nutrient. The selected associations are concentrated primarily within the order Bacteroidales, with multiple nutrient associations identified for Bacteroides, including phos￾phorus, iodine, vitamin E, food fortification, maltose, and hydroxyproline. Additional asso￾ci… view at source ↗
Figure 5
Figure 5. Figure 5: MCMC diagnostics for the adaptive F-stat flip proposal with the additional local￾move proposal algorithm: PIPs for all predictors, with truly active variables highlighted in red (top), traceplot of the model size (middle) and autocorrelation function of the model size (bottom). 36 [PITH_FULL_IMAGE:figures/full_fig_p036_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: MCMC diagnostics for the adaptive LR flip proposal with the additional local￾move proposal algorithm: PIPs for all predictors, with truly active variables highlighted in red (top), traceplot of the model size (middle) and autocorrelation function of the model size (bottom). 37 [PITH_FULL_IMAGE:figures/full_fig_p037_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Acceptance rates for various combinations of n, P, and λ using the LR-test proposal. To further evaluate the adaptive tuning strategy introduced in Algorithm 1, we perform an analogous controlled experiment using the LR-based proposal. As in the F-test setting, the aim is not to optimize predictive performance but to verify that the Robbins-Monro adaptation converges toward the value of λ that maximizes th… view at source ↗
Figure 8
Figure 8. Figure 8: Lambda (left) and acceptance rate (right) evolution across iterations for the LR-based adaptive scheme. In summary, the LR-test-based proposal exhibits behavior that is strongly consistent with the F-test-based mechanism. By exponentially weighting a dissimilarity derived from the LR-test p-value, the method preferentially targets variables that provide substantial improvements in model fit while retaining… view at source ↗
Figure 9
Figure 9. Figure 9: Diagram of the model. 44 [PITH_FULL_IMAGE:figures/full_fig_p044_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Posterior inclusion probabilities (ξbpj ) obtained with the adaptive proposal aug￾mented by the local move step. Red dots indicate selected associations under the same decision rule used in the main real-data analysis. 46 [PITH_FULL_IMAGE:figures/full_fig_p046_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Convergence diagnostics: trace plot of active associations (left) and autocorre￾lation function (right). The corresponding convergence diagnostics when including the move step are shown in [PITH_FULL_IMAGE:figures/full_fig_p047_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Convergence diagnostics for the adaptive proposal augmented by the local move step: trace plot of the number of active associations (left) and autocorrelation function (right). 48 [PITH_FULL_IMAGE:figures/full_fig_p048_12.png] view at source ↗
read the original abstract

Recent research has led to the development of MCMC algorithms with likelihood-informed proposals when targeting posterior distributions supported on discrete state spaces. Our work is placed within this field and puts forward a new MCMC methodology based upon similarity-driven proposals. Such proposals sway transitions towards states favored by the posterior via use of a data-driven measure of discrepancy between observations and the proposed model. Our approach can naturally cover classes of hierarchical models that involve both discrete variables and additional latent ones, without a requirement of integrating our the latter, in contrast to previous works in this field. The new algorithms are illustrated in simulation settings and in a involved real data scenario with a Dirichlet-Multinomial regression model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces similarity-driven proposals for MCMC algorithms targeting posterior distributions on discrete state spaces. The proposals use a data-driven discrepancy measure between observations and the proposed model to direct transitions toward states with higher posterior probability. A key feature is the ability to handle hierarchical models involving both discrete variables and additional latent variables without integrating out the latents, differing from previous methods. The approach is demonstrated in simulation studies and a real data application with a Dirichlet-Multinomial regression model.

Significance. If the proposals define valid Metropolis-Hastings kernels, the work could advance sampling for complex discrete hierarchical models by avoiding intractable marginalization of latent variables. This extends existing likelihood-informed MCMC methods and may yield better mixing in high-dimensional discrete spaces, with the real-data Dirichlet-Multinomial example indicating practical utility.

major comments (1)
  1. [Methods / proposal construction] The central construction of the similarity-driven proposal must be shown to yield a valid proposal kernel q(·|·) such that the Metropolis-Hastings acceptance probability restores the target posterior as the invariant distribution. Please provide the explicit form of the discrepancy-based proposal probability and the resulting acceptance ratio (likely in the main methods section).
minor comments (2)
  1. [Abstract] Abstract contains a typographical error: 'integrating our the latter' should read 'integrating out the latter'.
  2. [Numerical experiments] The real-data Dirichlet-Multinomial regression example would benefit from a table or figure reporting effective sample sizes or autocorrelation times to quantify mixing improvement over baselines.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and for recommending minor revision. We have addressed the major comment by expanding the Methods section to explicitly establish the validity of the proposal kernel.

read point-by-point responses
  1. Referee: [Methods / proposal construction] The central construction of the similarity-driven proposal must be shown to yield a valid proposal kernel q(·|·) such that the Metropolis-Hastings acceptance probability restores the target posterior as the invariant distribution. Please provide the explicit form of the discrepancy-based proposal probability and the resulting acceptance ratio (likely in the main methods section).

    Authors: We agree that explicitly demonstrating the validity of the similarity-driven proposal kernel is necessary to confirm that the Metropolis-Hastings algorithm targets the correct posterior. In the original manuscript the construction was motivated and described at a high level, but the explicit functional form of q(·|·) and the full acceptance ratio were not isolated in a dedicated derivation. In the revised version we have inserted a new subsection in the Methods section that (i) defines the data-driven discrepancy measure D(y, θ) between observations and the proposed state, (ii) gives the normalized proposal probability q(θ′|θ) ∝ exp(−D(y, θ′)) (with the normalizing constant shown to be finite), and (iii) derives the Metropolis-Hastings ratio α(θ, θ′) = min{1, [π(θ′)q(θ|θ′)] / [π(θ)q(θ′|θ)]} where π denotes the target posterior. This addition establishes that the chain is reversible with respect to π and therefore leaves the posterior invariant. The revision is confined to the Methods section and does not alter any results or conclusions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces similarity-driven proposals for MCMC on discrete spaces as an extension of existing likelihood-informed methods. The central construction relies on a data-driven discrepancy measure to guide proposals while preserving the posterior as the invariant distribution via Metropolis-Hastings. No step reduces by construction to a fitted parameter renamed as prediction, a self-definitional loop, or a load-bearing self-citation whose validity depends on the current work. The advantage for hierarchical models without marginalization is presented as a direct consequence of the proposal design rather than an imported uniqueness theorem or ansatz from prior author work. The derivation remains self-contained against external benchmarks of MCMC validity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5637 in / 1120 out tokens · 27398 ms · 2026-05-22T08:40:59.899585+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages

  1. [1]

    Statistical Science , volume=

    Approximating Bayes in the 21st century , author=. Statistical Science , volume=. 2024 , publisher=

  2. [2]

    arXiv preprint arXiv:2502.11738 , year=

    Surrogate-based ABC matches generalized Bayesian inference under specific discrepancy and kernel choices , author=. arXiv preprint arXiv:2502.11738 , year=

  3. [3]

    and Durante, D

    Legramanti, S. and Durante, D. and Alquier, P. , journal=. 2025 , publisher=

  4. [4]

    and Rosenthal, J

    Roberts, G. and Rosenthal, J. , journal=. 2001 , publisher=

  5. [5]

    , journal=

    Zanella, G. , journal=. 2020 , publisher=

  6. [6]

    and Yang, J

    Zhou, Q. and Yang, J. and Vats, D. and Roberts, G. and Rosenthal, J. , journal=. 2022 , publisher=

  7. [7]

    and Rosenthal, J

    Roberts, G. and Rosenthal, J. , journal=. 1998 , publisher=

  8. [8]

    and Yau, C

    Titsias, M. and Yau, C. , journal=. 2017 , publisher=

  9. [9]

    2013 , publisher=

    An introduction to statistical learning , author=. 2013 , publisher=

  10. [10]

    Electronic Journal of Statistics , volume=

    Informed reversible jump algorithms , author=. Electronic Journal of Statistics , volume=. 2021 , publisher=

  11. [11]

    , journal=

    Green, P. , journal=. 1995 , publisher=

  12. [12]

    and Argiento, R

    Wadsworth, D. and Argiento, R. and Guindani, M. and Galloway-Pena, J. and Shelburne, S. and Vannucci, M. , journal=. 2017 , publisher=

  13. [13]

    and Li, H

    Chen, J. and Li, H. , journal=

  14. [14]

    Journal of the American Statistical Association , volume=

    Multinomial inverse regression for text analysis , author=. Journal of the American Statistical Association , volume=. 2013 , publisher=

  15. [15]

    Science , volume=

    Linking long-term dietary patterns with gut microbial enterotypes , author=. Science , volume=. 2011 , publisher=

  16. [16]

    and Kuczynski, J

    Caporaso, G. and Kuczynski, J. and Stombaugh, J. and Bittinger, K. and Bushman, F. and Costello, E. and Fierer, N. and Pe. Nature Methods , volume=. 2010 , publisher=

  17. [17]

    and Casella, G

    Robert, C. and Casella, G. , isbn=. 2005 , publisher=

  18. [18]

    The Journal of Chemical Physics , volume=

    Equation of state calculations by fast computing machines , author=. The Journal of Chemical Physics , volume=. 1953 , publisher=

  19. [19]

    , journal=

    Hastings, W. , journal=. 1970 , publisher=

  20. [20]

    , booktitle=

    Neal, R. , booktitle=. 2011 , publisher=

  21. [21]

    and Calderhead, B

    Girolami, M. and Calderhead, B. , journal=. 2011 , publisher=

  22. [22]

    and Teh, Y

    Welling, M. and Teh, Y. , booktitle=. 2011 , address=

  23. [23]

    and Roberts, G

    Durmus, A. and Roberts, G. and Vilmart, G. and Zygalakis, K. , journal=. 2017 , publisher=

  24. [24]

    2011 , publisher=

    Statistical inference: the minimum distance approach , author=. 2011 , publisher=

  25. [25]

    , title =

    Zellner, A. , title =. Journal of the American Statistical Association , year =

  26. [26]

    Biometrika , volume=

    Robust and efficient estimation by minimising a density power divergence , author=. Biometrika , volume=. 1998 , publisher=

  27. [27]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Spatial interaction and the statistical analysis of lattice systems , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=

  28. [28]

    and Liverani, S

    Hastie, D. and Liverani, S. and Richardson, S. , journal=

  29. [29]

    Biostatistics , volume=

    Sparse inverse covariance estimation with the graphical lasso , author=. Biostatistics , volume=. 2008 , publisher=

  30. [30]

    The Annals of Mathematical Statistics , volume=

    Table for estimating the goodness of fit of empirical distributions , author=. The Annals of Mathematical Statistics , volume=. 1948 , publisher=

  31. [31]

    Scandinavian Actuarial Journal , volume=

    Cram. Scandinavian Actuarial Journal , volume=. 1928 , publisher=

  32. [32]

    2005 , publisher=

    Statistics for experimenters: design, innovation, and discovery , author=. 2005 , publisher=

  33. [33]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Bayesian model selection using test statistics , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2009 , publisher=

  34. [34]

    and Heritier, S

    Copt, S. and Heritier, S. , journal=. 2007 , publisher=

  35. [35]

    and Neath, A

    Riedle, B. and Neath, A. and Cavanaugh, J. , journal=. 2020 , publisher=

  36. [36]

    and McCulloch, R

    George, E. and McCulloch, R. , journal=. 1997 , publisher=

  37. [37]

    Journal of the American Statistical Association , volume=

    Shotgun stochastic search for “large p” regression , author=. Journal of the American Statistical Association , volume=. 2007 , publisher=

  38. [38]

    Annals of Statistics , volume=

    Variable selection and high-dimensional graphs with the lasso , author=. Annals of Statistics , volume=

  39. [39]

    and Kannan, B

    Peters, G. and Kannan, B. and Lasscock, B. and Mellen, C. and others , journal=. 2010 , publisher=

  40. [40]

    and Livingstone, S

    Liang, X. and Livingstone, S. and Griffin, J. , journal=. 2023 , publisher=

  41. [41]

    and Livingstone, S

    Liang, X. and Livingstone, S. and Griffin, J. , journal=. 2022 , publisher=

  42. [42]

    and Griffin, J

    Wan, K. and Griffin, J. , journal=. 2021 , publisher=

  43. [43]

    , journal=

    Tierney, L. , journal=. 1998 , publisher=

  44. [44]

    2012 , publisher=

    Markov chains and stochastic stability , author=. 2012 , publisher=

  45. [45]

    , journal=

    Peskun, P. , journal=. 1973 , publisher=

  46. [46]

    2017 , publisher=

    Markov chains and mixing times , author=. 2017 , publisher=

  47. [47]

    and Thoms, J

    Andrieu, C. and Thoms, J. , journal=. 2008 , publisher=

  48. [48]

    The Annals of Mathematical Statistics , pages=

    A stochastic approximation method , author=. The Annals of Mathematical Statistics , pages=. 1951 , publisher=

  49. [49]

    2003 , publisher=

    Stochastic approximation and recursive algorithms and applications , author=. 2003 , publisher=

  50. [50]

    and Rosenthal, J

    Roberts, G. and Rosenthal, J. , journal=. 2007 , publisher=

  51. [51]

    and Rosenthal, J

    Roberts, G. and Rosenthal, J. , journal=. 2009 , publisher=

  52. [52]

    and Gilks, W

    Gelman, A. and Gilks, W. and Roberts, G. , journal=. 1997 , publisher=

  53. [53]

    and Roberts, G

    Gelman, A. and Roberts, G. and Gilks, W. , journal=. 1996 , publisher=

  54. [54]

    Nutrients , volume=

    The role of microbial amino acid metabolism in host metabolism , author=. Nutrients , volume=. 2015 , publisher=

  55. [55]

    and Nilsson, A

    Kovatcheva-Datchary, P. and Nilsson, A. and Akrami, R. and Lee, Y. and De Vadder, F. and Arora, T. and Hallen, A. and Martens, E. and Bj. Cell Metabolism , volume=. 2015 , publisher=

  56. [56]

    and Vodnar, D

    Precup, G. and Vodnar, D. , journal=. 2019 , publisher=

  57. [57]

    Nature , volume=

    Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease , author=. Nature , volume=. 2011 , publisher=

  58. [58]

    New England Journal of Medicine , volume=

    Intestinal microbial metabolism of phosphatidylcholine and cardiovascular risk , author=. New England Journal of Medicine , volume=. 2013 , publisher=

  59. [59]

    and Frank, M

    Radka, C. and Frank, M. and Rock, C. and Yao, J. , journal=. 2020 , publisher=

  60. [60]

    and Wearsch, P

    Parker, B. and Wearsch, P. and Veloo, A. and Rodriguez-Palacios, A. , journal=. 2020 , publisher=

  61. [61]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    A general framework for updating belief distributions , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2016 , publisher=

  62. [62]

    The Annals of Statistics , pages=

    Gibbs Posterior for Variable Selection in High-Dimensional Classification and Data Mining , author=. The Annals of Statistics , pages=. 2008 , publisher=

  63. [63]

    Annual review of statistics and its application , volume=

    Approximate bayesian computation , author=. Annual review of statistics and its application , volume=. 2019 , publisher=

  64. [64]

    , journal=

    Geyer, C. , journal=. 1992 , publisher=

  65. [65]

    and Neal, R

    Jain, S. and Neal, R. , journal=. A split-merge. 2004 , publisher=

  66. [66]

    and McCulloch, R

    George, E. and McCulloch, R. , journal=. Variable selection via. 1993 , publisher=

  67. [67]

    The Annals of Statistics , volume=

    Markov Chains for Exploring Posterior Distributions , author=. The Annals of Statistics , volume=. 1994 , publisher=

  68. [68]

    2009 , publisher=

    Robust Statistics , author=. 2009 , publisher=