Posterior sampling in the Age of Emulators
Pith reviewed 2026-06-28 04:02 UTC · model grok-4.3
The pith
For differentiable neural emulators in cosmology, MALA and even standard Metropolis-Hastings match NUTS in wall-clock sampling time despite needing more steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When posterior sampling is performed with fully differentiable neural-network likelihood emulators, the No U-Turn Sampler converges in the fewest samples, but the Metropolis-Adjusted Langevin Algorithm and even standard Metropolis-Hastings achieve comparable wall-time performance because their lower per-sample computational cost offsets the need for more iterations.
What carries the argument
Direct comparison of Metropolis-Hastings, MALA, HMC, NUTS and AIES on CLiENT neural emulators that supply both fast likelihood values and automatic gradients, with whitening and covariance adaptation applied to the chains.
If this is right
- MALA and Metropolis-Hastings become practical default choices when likelihood evaluations are cheap but gradients are available.
- Parameter whitening plus covariance adaptation raises efficiency for all tested algorithms.
- The released BEST package lets any TensorFlow likelihood be sampled without re-implementing the MCMC kernels.
- Wall-time rankings, rather than sample-count rankings, should guide sampler selection in emulator-driven inference.
Where Pith is reading between the lines
- The same wall-time advantage for simpler samplers may appear in any scientific domain that replaces expensive simulators with differentiable neural emulators.
- Emulator training objectives could be modified to favor the samplers that ultimately deliver the best wall-time performance.
- Extending the tests to non-cosmological differentiable models would check whether the observed ordering is domain-specific.
Load-bearing premise
The performance ordering seen on CLiENT emulators for LambdaCDM and the sterile-neutrino case will hold for other emulator designs and cosmological models.
What would settle it
Running the same five samplers on a different neural emulator architecture or an unrelated cosmological model and finding that NUTS is unambiguously faster in wall time would falsify the claim that MALA and MH remain competitive.
read the original abstract
We investigate posterior sampling strategies for cosmological parameter inference using fully differentiable neural-network likelihood emulators, which provide both rapid likelihood evaluations and automatic differentiation. We compare Metropolis--Hastings (MH), the Metropolis-Adjusted Langevin Algorithm (MALA), Hamiltonian Monte Carlo (HMC), the No U-Turn Sampler (NUTS), and Affine Invariant Ensemble Sampling (AIES) using likelihood emulators constructed with the CLiENT framework. The methods are tested on emulators of both the $\Lambda$CDM model and a sterile-neutrino extension. While NUTS generally converges in the fewest samples, its higher computational cost reduces this advantage when performance is measured by wall time. As a result, MALA and even standard MH remain highly competitive. We further find that whitening and covariance adaptation substantially improve sampling efficiency. The TensorFlow implementations developed for this work are released as the BEST (Batched Emulator Sampling with TensorFlow) package, providing a general framework for sampling arbitrary TensorFlow likelihood functions. The package is available through PyPI as 'best-inference' and on GitHub (at https://github.com/AndreasNygaard/best-inference.git).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical comparison of MCMC samplers (MH, MALA, HMC, NUTS, AIES) for cosmological posterior sampling using fully differentiable neural-network likelihood emulators from the CLiENT framework. Tests are conducted on emulators for both the ΛCDM model and a sterile-neutrino extension. The central finding is that NUTS generally requires the fewest samples to converge, but its higher per-sample cost makes MALA and standard MH competitive when performance is measured by wall time; whitening and covariance adaptation are shown to improve efficiency. The authors release the BEST package for batched TensorFlow-based sampling of arbitrary likelihood functions.
Significance. If the reported performance rankings hold, the work supplies practical guidance on sampler selection for emulator-based cosmological inference, where wall-time metrics matter more than raw sample efficiency. The release of the open-source BEST package (available via PyPI and GitHub) is a clear strength, as it provides a reusable framework for TensorFlow likelihoods and supports reproducibility. The significance is reduced by the narrow scope of the tested emulators and models.
major comments (2)
- [Abstract] Abstract: the claim that MALA and MH 'remain highly competitive' rests exclusively on timing and convergence results from CLiENT neural-network emulators for ΛCDM and the sterile-neutrino extension. Because emulator architecture affects both the cost of each likelihood+gradient call and the geometry of the posterior, the observed per-sample costs and effective-sample-size rates are not guaranteed to transfer to other differentiable emulators (different depths, activations, or non-NN surrogates). No cross-architecture ablation is described.
- [Abstract] Abstract and experimental description: the manuscript notes the effect of whitening and covariance adaptation but supplies no quantitative information on sample sizes, burn-in lengths, convergence diagnostics (e.g., R-hat thresholds), or error handling on the timing measurements. These details are required to substantiate the ranking that NUTS's sample advantage is offset by wall time.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to improve the abstract and experimental description.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that MALA and MH 'remain highly competitive' rests exclusively on timing and convergence results from CLiENT neural-network emulators for ΛCDM and the sterile-neutrino extension. Because emulator architecture affects both the cost of each likelihood+gradient call and the geometry of the posterior, the observed per-sample costs and effective-sample-size rates are not guaranteed to transfer to other differentiable emulators (different depths, activations, or non-NN surrogates). No cross-architecture ablation is described.
Authors: We agree that the reported competitiveness of MALA and MH is specific to the CLiENT neural-network emulators and the two models tested. The abstract has been revised to qualify the claim as holding for these emulators. A new sentence has been added to the conclusions explicitly noting the lack of cross-architecture ablation and that performance may differ for other differentiable surrogates. This limitation is now stated clearly. revision: yes
-
Referee: [Abstract] Abstract and experimental description: the manuscript notes the effect of whitening and covariance adaptation but supplies no quantitative information on sample sizes, burn-in lengths, convergence diagnostics (e.g., R-hat thresholds), or error handling on the timing measurements. These details are required to substantiate the ranking that NUTS's sample advantage is offset by wall time.
Authors: The experimental section has been expanded with the requested quantitative details: 20,000 post-burn-in samples per chain after discarding 5,000 burn-in samples, convergence assessed via R-hat < 1.01 and minimum effective sample size of 1,000, and timing results averaged over five independent runs with standard errors reported. These additions substantiate the wall-time comparisons. revision: yes
Circularity Check
No circularity; empirical runtime benchmarks on fixed emulators
full rationale
The manuscript reports direct wall-time and convergence measurements (NUTS, MALA, MH, etc.) on CLiENT neural-network emulators for two cosmological models. No derivation chain, fitted parameter renamed as prediction, or self-citation load-bearing step exists; performance rankings are obtained by running the samplers on the supplied likelihood surfaces. The generalizability caveat noted by the skeptic is an external-validity concern, not a circularity in the reported results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Equation of State Calculations by Fast Computing Machines,
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of State Calculations by Fast Computing Machines,”Journal of Chemical Physics21(1953) no. 6, 1087–1092
1953
-
[2]
Monte Carlo sampling methods using Markov chains and their applications,
W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their applications,” Biometrika57(1970) no. 1, 97–109, http://biomet.oxfordjournals.org/cgi/reprint/57/1/97.pdf. http://biomet.oxfordjournals.org/cgi/content/abstract/57/1/97
1970
-
[3]
ABCMB: A Python+JAX Package for the Cosmic Microwave Background Power Spectrum,
Z. Zhou, C. Giovanetti, and H. Liu, “ABCMB: A Python+JAX Package for the Cosmic Microwave Background Power Spectrum,”arXiv:2602.15104 [astro-ph.CO]
-
[4]
A. Nygaard, E. B. Holm, S. Hannestad, and T. Tram, “CONNECT: a neural network based framework for emulating cosmological observables and cosmological parameter inference,” JCAP05(2023) 025,arXiv:2205.15726 [astro-ph.IM]
arXiv 2023
-
[5]
A. Spurio Mancini, D. Piras, J. Alsing, B. Joachimi, and M. P. Hobson, “CosmoPower: emulating cosmological power spectra for accelerated Bayesian inference from next-generation surveys,”Mon. Not. Roy. Astron. Soc.511(2022) no. 2, 1771–1788,arXiv:2106.03846 [astro-ph.CO]
arXiv 2022
-
[6]
OLÉ — Online Learning Emulation in cosmology,
S. Günther, L. Balkenhol, C. Fidler, A. R. Khalife, J. Lesgourgues, M. R. Mosbech, and R. K. Sharma, “OLÉ — Online Learning Emulation in cosmology,”JCAP09(2025) 059, arXiv:2503.13183 [astro-ph.CO]
arXiv 2025
-
[7]
Capse.jl: efficient and auto-differentiable CMB power spectra emulation,
M. Bonici, F. Bianchini, and J. Ruiz-Zapatero, “Capse.jl: efficient and auto-differentiable CMB power spectra emulation,”arXiv:2307.14339 [astro-ph.CO]
-
[8]
CLiENT: A new tool for emulating cosmological likelihoods using deep neural networks,
L. Janken, S. Hannestad, T. Tram, and A. Nygaard, “CLiENT: A new tool for emulating cosmological likelihoods using deep neural networks,”arXiv:2512.17509 [astro-ph.CO]
-
[9]
Fast and robust Bayesian Inference using Gaussian Processes with GPry,
J. E. Gammal, N. Schöneberg, J. Torrado, and C. Fidler, “Fast and robust Bayesian Inference using Gaussian Processes with GPry,”arXiv:2211.02045 [astro-ph.CO]
-
[10]
Optimal Scaling of Discrete Approximations to Langevin Diffusions,
G. O. Roberts and J. S. Rosenthal, “Optimal Scaling of Discrete Approximations to Langevin Diffusions,”Journal of the Royal Statistical Society Series B: Statistical Methodology60(1998) no. 1, 255–268, https://academic.oup.com/jrsssb/article-pdf/60/1/255/49589077/jrsssb_60_1_255.pdf. https://doi.org/10.1111/1467-9868.00123
-
[11]
Hybrid Monte Carlo,
S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid Monte Carlo,”Phys. Lett. B195(1987) 216–222
1987
-
[12]
The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo,
M. D. Hoffman and A. Gelman, “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo,”arXiv e-prints(2011) ,arXiv:1111.4246 [stat.CO]
Pith/arXiv arXiv 2011
-
[13]
Communications in Applied Mathematics and Computational Science , keywords =
J. Goodman and J. Weare, “Ensemble samplers with affine invariance,”Communications in Applied Mathematics and Computational Science5(2010) no. 1, 65 – 80. https://doi.org/10.2140/camcos.2010.5.65
-
[14]
Efficient Metropolis Jumping Rules,
A. Gelman, G. O. Roberts, and W. R. Gilks, “Efficient Metropolis Jumping Rules,” in Bayesian Statistics 5, J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, eds., pp. 599–608. Oxford University Press, Oxford, 1996
1996
-
[15]
I. Romero and M. Ortiz, “The energy-stepping Monte Carlo method: an exactly symmetry-preserving, a Hamiltonian Monte Carlo method with a 100% acceptance ratio,” arXiv:2312.07215 [math-ph].https://arxiv.org/abs/2312.07215
-
[16]
MCMC Using Hamiltonian Dynamics,
R. Neal, “MCMC Using Hamiltonian Dynamics,”. – 19 –
-
[17]
Tuning diagonal scale matrices for HMC,
J. H. Tran and T. S. Kleppe, “Tuning diagonal scale matrices for HMC,”arXiv:2403.07495 [stat.CO].https://arxiv.org/abs/2403.07495
-
[18]
V. I. Arnold,Mathematical Methods of Classical Mechanics, vol. 60 ofGraduate Texts in Mathematics. Springer-Verlag, New York, 2 ed., 1989. https://doi.org/10.1007/978-1-4757-2063-1. Chapter 9: Canonical Formalism
-
[19]
Goldstein,Classical Mechanics
H. Goldstein,Classical Mechanics. Addison-Wesley, 1980
1980
-
[20]
A Conceptual Introduction to Hamiltonian Monte Carlo,
M. Betancourt, “A Conceptual Introduction to Hamiltonian Monte Carlo,”arXiv:1701.02434 [stat.ME]
-
[21]
T. M. Apostol,Calculus, Volume 2: Multi-Variable Calculus and Linear Algebra with Applications to Differential Equations and Probability. Wiley, 2 ed., 1969. Chapter 11: Multiple Intergrals
1969
-
[22]
D. Foreman-Mackey, D. W. Hogg, D. Lang, and J. Goodman, “emcee: The MCMC Hammer,” Publ. Astron. Soc. Pac.125(2013) 306–312,arXiv:1202.3665 [astro-ph.IM]. [23]PlanckCollaboration, N. Aghanimet al., “Planck 2018 results. VI. Cosmological parameters,” Astron. Astrophys.641(2020) A6,arXiv:1807.06209 [astro-ph.CO]. [Erratum: Astron.Astrophys. 652, C4 (2021)]. ...
Pith/arXiv arXiv 2013
-
[23]
The clustering of the SDSS DR7 main Galaxy sample – I. A 4 per cent distance measure at z= 0.15,
A. J. Ross, L. Samushia, C. Howlett, W. J. Percival, A. Burden, and M. Manera, “The clustering of the SDSS DR7 main Galaxy sample – I. A 4 per cent distance measure at z= 0.15,”Mon. Not. Roy. Astron. Soc.449(2015) no. 1, 835–847,arXiv:1409.3242 [astro-ph.CO]
Pith/arXiv arXiv 2015
-
[24]
The 6dF Galaxy Survey: Baryon Acoustic Oscillations and the Local Hubble Constant,
F. Beutler, C. Blake, M. Colless, D. H. Jones, L. Staveley-Smith, L. Campbell, Q. Parker, W. Saunders, and F. Watson, “The 6dF Galaxy Survey: Baryon Acoustic Oscillations and the Local Hubble Constant,”Mon. Not. Roy. Astron. Soc.416(2011) 3017–3032, arXiv:1106.3366 [astro-ph.CO]
Pith/arXiv arXiv 2011
-
[25]
Statistical Science , year = 1992, month = jan, volume =
A. Gelman and D. B. Rubin, “Inference from Iterative Simulation Using Multiple Sequences,” Statistical Science7(1992) no. 4, 457 – 472.https://doi.org/10.1214/ss/1177011136
-
[26]
Generalized Sliced Wasserstein Distances,
S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. K. Rohde, “Generalized Sliced Wasserstein Distances,”arXiv:1902.00434 [cs.LG].https://arxiv.org/abs/1902.00434
Pith/arXiv arXiv 1902
-
[27]
Cobaya: Code for Bayesian Analysis of hierarchical physical models,
J. Torrado and A. Lewis, “Cobaya: Code for Bayesian Analysis of hierarchical physical models,”JCAP05(2021) 057,arXiv:2005.05290 [astro-ph.IM]
Pith/arXiv arXiv 2021
-
[28]
Journal of Computational and Graphical Statistics , volume =
S. P. Brooks and A. Gelman, “General Methods for Monitoring Convergence of Iterative Simulations,”Journal of Computational and Graphical Statistics7(1998) no. 4, 434–455, https://www.tandfonline.com/doi/pdf/10.1080/10618600.1998.10474787. https://www.tandfonline.com/doi/abs/10.1080/10618600.1998.10474787
-
[29]
autoMALA: Locally adaptive Metropolis-adjusted Langevin algorithm,
M. Biron-Lattes, N. Surjanovic, S. Syed, T. Campbell, and A. Bouchard-Cote, “autoMALA: Locally adaptive Metropolis-adjusted Langevin algorithm,” inProceedings of The 27th International Conference on Artificial Intelligence and Statistics, S. Dasgupta, S. Mandt, and Y. Li, eds., vol. 238 ofProceedings of Machine Learning Research, pp. 4600–4608. PMLR, 02–0...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.