Approximate Bayesian Computation sequential Monte Carlo via random forests
Pith reviewed 2026-05-24 00:02 UTC · model grok-4.3
The pith
Distributional random forests combined with sequential Monte Carlo let approximate Bayesian computation infer joint posteriors directly from simulations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We further adapt random forests to the ABC setting in two ways. The first exploits distributional random forests to provide a direct method for inferring the joint posterior distribution of parameters of interest, while the second describes a sequential Monte Carlo approach which updates the prior distribution iteratively to focus on the most likely regions in the parameter space. We show that the new methods can accurately infer posterior distributions for a wide range of deterministic and stochastic models in different scientific areas.
What carries the argument
Distributional random forests trained directly on simulated parameter-output pairs to produce joint posterior distributions, combined with sequential Monte Carlo prior updating.
Load-bearing premise
Distributional random forests trained on simulated parameter-output pairs will produce well-calibrated joint posteriors without the usual summary-statistic selection step.
What would settle it
A benchmark experiment on a model with a known analytic posterior, such as a low-dimensional Gaussian, that checks whether the random-forest-derived credible intervals achieve the claimed coverage rates.
Figures
read the original abstract
Approximate Bayesian Computation (ABC) is a popular inference method when likelihoods are hard to come by. Practical bottlenecks of ABC applications include selecting statistics that summarize the data without losing too much information or introducing uncertainty, and choosing distance functions and tolerance thresholds that balance accuracy and computational efficiency. Recent studies have shown that ABC methods using random forest (RF) methodology perform well while circumventing many of ABC's drawbacks. However, RF construction is computationally expensive for large numbers of trees and model simulations, and there can be high uncertainty in the posterior if the prior distribution is uninformative. Here we further adapt random forests to the ABC setting in two ways. The first exploits distributional random forests to provide a direct method for inferring the joint posterior distribution of parameters of interest, while the second describes a sequential Monte Carlo approach which updates the prior distribution iteratively to focus on the most likely regions in the parameter space. We show that the new methods can accurately infer posterior distributions for a wide range of deterministic and stochastic models in different scientific areas.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes two extensions to random-forest Approximate Bayesian Computation: distributional random forests that estimate the joint posterior directly from raw simulated parameter-output pairs, and an ABC-SMC variant that iteratively refines the prior toward high-probability regions. The central claim is that these methods accurately recover posteriors across a wide range of deterministic and stochastic models from multiple scientific domains while bypassing explicit summary-statistic selection, distance functions, and tolerance tuning.
Significance. If the calibration claim holds, the work would address two persistent practical bottlenecks in ABC and could simplify inference for complex models. The distributional-forest and SMC integration constitute a methodological contribution over prior RF-ABC approaches, but the significance is conditional on empirical demonstration that the forests recover well-calibrated joint posteriors without hand-crafted summaries.
major comments (3)
- [Abstract] Abstract: the assertion that the methods 'can accurately infer posterior distributions for a wide range of deterministic and stochastic models' supplies no quantitative results, coverage probabilities, error metrics, or baseline comparisons, so the central empirical claim cannot be assessed from the summary.
- [Section 3] Section 3: the distributional random forest construction is presented without a diagnostic (PIT histograms, posterior coverage checks, or calibration plots) that isolates whether leaf distributions remain well-calibrated when the output space is high-dimensional or the observations exhibit complex dependence; this assumption is load-bearing for the claim that summary-statistic selection is circumvented.
- [Numerical examples] Numerical examples: the reported accuracy on the chosen test cases does not contain a stress test that would reveal degradation of calibration under high-dimensional or correlated outputs; without such a check the generalization to the claimed 'wide range' of models remains unverified.
minor comments (1)
- [Abstract] The abstract would be clearer if it briefly listed the specific models and output dimensions used in the numerical studies.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on strengthening the empirical support for our claims. We address each major point below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that the methods 'can accurately infer posterior distributions for a wide range of deterministic and stochastic models' supplies no quantitative results, coverage probabilities, error metrics, or baseline comparisons, so the central empirical claim cannot be assessed from the summary.
Authors: We agree that the abstract would benefit from quantitative support. In the revised version we will incorporate specific coverage probabilities, error metrics, and baseline comparisons drawn from the numerical examples to substantiate the central claim. revision: yes
-
Referee: [Section 3] Section 3: the distributional random forest construction is presented without a diagnostic (PIT histograms, posterior coverage checks, or calibration plots) that isolates whether leaf distributions remain well-calibrated when the output space is high-dimensional or the observations exhibit complex dependence; this assumption is load-bearing for the claim that summary-statistic selection is circumvented.
Authors: The current presentation of the distributional random forest in Section 3 does not include explicit calibration diagnostics. We will add PIT histograms and posterior coverage checks in the revised Section 3, with discussion of calibration behavior for the output dimensions and dependence structures appearing in our examples. revision: yes
-
Referee: [Numerical examples] Numerical examples: the reported accuracy on the chosen test cases does not contain a stress test that would reveal degradation of calibration under high-dimensional or correlated outputs; without such a check the generalization to the claimed 'wide range' of models remains unverified.
Authors: The numerical examples cover models from multiple domains, yet we acknowledge that dedicated stress tests for high-dimensional or strongly correlated outputs are absent. We will include such stress tests or additional calibration analysis in the revised numerical examples section to better support the generalization statement. revision: yes
Circularity Check
No circularity in methodological proposal for RF-based ABC
full rationale
The paper proposes distributional random forests for direct joint posterior inference and an SMC update to the prior, trained on simulated parameter-output pairs. No derivation chain, equation, or fitted quantity is shown that reduces the reported posteriors or accuracy claims to the inputs by construction. Citations to prior RF-ABC work are not load-bearing for the central claims, and the numerical examples on deterministic/stochastic models do not exhibit self-definitional or fitted-input-called-prediction patterns. The approach is a standard simulation-based methodological extension and remains self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 '...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Y. Atchad \'e and G. Fort. Limit theorems for some adaptive mcmc algorithms with subgeometric kernels. Bernoulli, 16: 0 116--154, 2010
work page 2010
-
[4]
M. Beaumont, W. Zhang, and D. Dalding. Approximate B ayesian computation in population genetics. Genetics, 162: 0 2025--2035, 2002
work page 2025
-
[5]
M. A. Beaumont, J.-M. Cornuet, J.-M. Marin, and C. P. Robert. Adaptive approximate Bayesian computation. Biometrika, 96: 0 983--990, 2009
work page 2009
-
[6]
L. Breiman. Random forests. Machine Learning, 45: 0 5--32, 2001
work page 2001
-
[7]
D. \'Cevid, L. Michel, J. Näf, P. Bühlmann, and N. Meinshausen. Distributional random forests: Heterogeneity adjustment and multivariate distributional regression. Journal of Machine Learning Research, 23: 0 1--79, 2022
work page 2022
-
[8]
I. Dahmer and G. Kersting. The internal branch lengths of the K ingman coalescent. The Annals of Applied Probability, 25: 0 1325--1348, 2015
work page 2015
-
[9]
A. Degasperi and S. Gilmore. Sensitivity analysis of stochastic models of bistable biochemical reactions. In M. Bernardo, P. Degano, and G. Zavattaro, editors, Formal Methods for Computational Systems Biology, volume 5016, pages 1--20. Springer-Verlag, Berlin, Heidelberg, 2008
work page 2008
-
[10]
P. Del Moral, A. Doucet, and A. Jasra. An adaptive sequential Monte Carlo method for approximate Bayesian computation . Statistics and Computing, 22: 0 1009--1020, 2012
work page 2012
-
[11]
S. Desai and T. B. Ouarda. Regional hydrological frequency analysis at ungauged sites with random forest regression. Journal of Hydrology, 594: 0 125861, 2021
work page 2021
-
[12]
K. N. Dinh, S. Tavar\'e, and Z. Zhang. Irving institute for cancer dynamics, 2024. URL https://cancerdynamics.columbia.edu/news/approximate-bayesian-computation-and-distributional-random-forests. Accessed on February 26, 2024
work page 2024
-
[13]
C. C. Drovandi and A. N. Pettitt. Estimation of parameters for macroparasite population evolution using approximate B ayesian computation. Biometrics, 67: 0 225--233, 2011
work page 2011
-
[14]
S. Filippi, C. P. Barnes, J. Cornebise, and M. P. Stumpf. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo . Statistical Applications in Genetics and Molecular Biology, 12: 0 87--107, 2013
work page 2013
-
[15]
Y.-X. Fu. Statistical properties of segregating sites. Theoretical Population Biology, 48: 0 172--197, 1995
work page 1995
-
[16]
Y.-X. Fu and W.-H. Li. Estimating the age of the common ancestor of a sample of dna sequences. Molecular Biology and Evolution, 14: 0 195--199, 1997
work page 1997
-
[17]
D. T. Gillespie. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81: 0 2340--2361, 1977
work page 1977
-
[18]
A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, and A. J. Smola. A kernel method for the two-sample problem . In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference . The MIT Press, 2007
work page 2006
-
[19]
W. K. Hastings. Monte carlo sampling methods using M arkov chains and their applications. Biometrika, 57: 0 97--109, 1970
work page 1970
-
[20]
B. Iooss, S. D. Veiga, A. Janon, G. Pujol, with contributions from Baptiste Broto, K. Boumhaout, L. Clouvel, T. Delage, R. E. Amri, J. Fruth, L. Gilquin, J. Guillaume, M. Herin, M. I. Idrissi, L. Le Gratiet , P. Lemaitre, A. Marrel, A. Meynaoui, B. L. Nelson, F. Monari, R. Oomen, O. Rakovec, B. Ramos, P. Rochet, O. Roustant, G. Sarazin, E. Song, J. Staum,...
work page 2024
- [21]
-
[22]
H. Jung and P. Marjoram. Choice of summary statistic weights in Approximate Bayesian Computation . Statistical Applications in Genetics and Molecular Biology, 10: 0 art. 45, 2011
work page 2011
-
[23]
N. Keiding. Maximum likelihood estimation in the birth-and-death process. The Annals of Statistics, 3: 0 363--372, 1975
work page 1975
-
[24]
D. G. Kendall. On the generalized ``birth-and-death" process. The Annals of Mathematical Statistics, 19: 0 1--15, 1948
work page 1948
-
[25]
M. Kimura. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61: 0 893--903, 1969
work page 1969
-
[26]
J. F. C. Kingman. The coalescent. Stochastic Processes and their Applications, 13: 0 235--248, 1982
work page 1982
-
[27]
A. Lee. On the choice of MCMC kernels for approximate Bayesian computation with SMC samplers . In Proceedings of the 2012 Winter simulation conference (WSC), pages 1--12. IEEE, 2012
work page 2012
-
[28]
J. S. Liu, F. Liang, and W. H. Wong. The multiple-try method and local optimization in M etropolis sampling. Journal of the American Statistical Association, 95: 0 121--134, 2000
work page 2000
-
[29]
A. J. Lotka. Elements of Physical Biology. Williams and Wilkins Co., London, 1925
work page 1925
- [30]
- [31]
-
[32]
P. Marjoram, J. Molitor, V. Plagnol, and S. Tavar \'e . Markov chain M onte C arlo without likelihoods. Proceedings of the National Academy of Sciences, 100: 0 15324--15328, 2003
work page 2003
-
[33]
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21: 0 1087--1092, 1953
work page 1953
-
[34]
L. Michel and D. \'Cevid. drf: Distributional Random Forests, 2021. URL https://CRAN.R-project.org/package=drf. R package version 1.1.0
work page 2021
-
[35]
F. Monari and P. Strachan. Characterization of an airflow network model by sensitivity analysis: parameter screening, fixing, prioritizing and mapping. Journal of Building Performance Simulation, 10: 0 17--36, 2017
work page 2017
-
[36]
M. D. Morris. Factorial sampling plans for preliminary computational experiments. Technometrics, 33: 0 161--174, 1991
work page 1991
-
[37]
D. Prangle. Adapting the ABC distance function. Bayesian Analysis, 12: 0 289--309, 2017
work page 2017
-
[38]
J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun, and M. W. Feldman. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution, 16: 0 1791--1798, 1999
work page 1999
-
[39]
P. Pudlo, J.-M. Marin, A. Estoup, J.-M. Cornuet, M. Gautier, and C. P. Robert. Reliable ABC model choice via random forests. Bioinformatics, 32: 0 859--866, 2016
work page 2016
-
[40]
L. Raynal, J.-M. Marin, P. Pudlo, M. Ribatet, C. P. Robert, and A. Estoup. ABC random forests for B ayesian parameter inference. Bioinformatics, 35: 0 1720--1728, 2019
work page 2019
-
[41]
S. J. Rigatti. Random forest. Journal of Insurance Medicine, 47: 0 31--39, 2017
work page 2017
-
[42]
M. R. Segal. Machine learning benchmarks and random forest regression. Technical report, UCSF: Center for Bioinformatics and Molecular Biostatistics, 2004. URL Retrieved from https://escholarship.org/uc/item/35x3v9t4
work page 2004
-
[43]
S. A. Sisson, Y. Fan, and M. M. Tanaka. Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences, 104: 0 1760--1765, 2007
work page 2007
-
[44]
S. A. Sisson, Y. Fan, and M. Beaumont, editors. Handbook of Approximate Bayesian Computation. CRC Press, 2018
work page 2018
- [45]
- [46]
-
[47]
S. Tavar \'e , D. J. Balding, R. C. Griffiths, and P. Donnelly. Inferring coalescence times from DNA sequence data. Genetics, 145: 0 505--518, 1997
work page 1997
-
[48]
L. Tierney. Markov chains for exploring posterior distributions. The Annals of Statistics, 22: 0 1701--1762, 1994
work page 1994
-
[49]
T. Toni, D. Welch, N. Strelkowa, A. Ipsen, and M. P. Stumpf. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of The Royal Society Interface, 6: 0 187--202, 2008
work page 2008
- [50]
-
[51]
G. A. Watterson. On the number of segregating sites in genetical models without recombination. Theoretical Population Biology, 7: 0 256--276, 1975
work page 1975
-
[52]
D. J. Wilkinson. Stochastic Modelling for Systems Biology . Chapman and Hall/CRC, 2018
work page 2018
-
[53]
, " * write output.state after.block = add.period write newline
ENTRY address archive author booktitle chapter edition editor eprint howpublished institution journal key month note number organization pages publisher school series title type url doi volume year label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 'after.sente...
-
[54]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.