Compositional amortized inference for large-scale hierarchical Bayesian models
Pith reviewed 2026-05-22 14:19 UTC · model grok-4.3
The pith
Error-damping estimator stabilizes compositional score matching for hierarchical models with over 750,000 parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an error-damping correction applied inside compositional score matching removes the numerical instability that previously limited aggregation of many data points, while still recovering accurate posterior approximations. This enables amortized inference on hierarchical models whose joint simulation would otherwise be prohibitive. The paper verifies the fix first on synthetic benchmarks and then on a fluorescence-microscopy inverse problem whose parameter count exceeds 750,000.
What carries the argument
The error-damping estimator inside compositional score matching, which rescales or corrects the aggregated score estimates to prevent error accumulation across large numbers of observations.
If this is right
- Numerical stability holds for datasets containing up to 100,000 points on controlled benchmarks.
- Competitive posterior accuracy is obtained on hierarchical autoregressive models while consuming fewer than one full joint simulation.
- The same procedure inverts a real microscopy problem whose dimension exceeds 750,000 parameters.
- Compositional amortized inference therefore becomes practical for hierarchical models whose direct simulation is computationally prohibitive.
Where Pith is reading between the lines
- The damping technique may transfer to other simulation-based inference pipelines that rely on score or gradient aggregation.
- Similar corrections could be tested on hierarchical models in fields such as systems biology or neuroimaging where data volumes are comparably large.
- The method invites direct comparison against non-compositional baselines on the same 750,000-parameter microscopy task to quantify the exact simulation savings.
Load-bearing premise
The error-damping estimator continues to preserve statistical accuracy, not just numerical stability, when the underlying diffusion approximations are applied to real noisy scientific measurements.
What would settle it
Parameter recovery on the microscopy dataset deviates markedly from independent reference estimates or ground-truth values once the number of aggregated points exceeds a few tens of thousands.
Figures
read the original abstract
Amortized Bayesian inference (ABI) with neural networks has emerged as a powerful simulation-based approach for estimating complex mechanistic models. However, extending ABI to hierarchical models, a cornerstone of modern Bayesian analysis, has been a major hurdle due to the need to simulate and process massive datasets. Our study tackles these challenges by extending compositional score matching (CSM), a divide-and-conquer strategy for Bayesian updating using diffusion models. We develop a new error-damping estimator to address previous stability issues of CSM when aggregating large numbers of data points. We first verified the numerical stability with up to 100,000 data points on a controlled benchmark. We then evaluated our method on a hierarchical AR model, achieving competitive performance to direct ABI baselines on smaller problem sizes while using less than one full model simulation for larger problem sizes. Finally, we address a large-scale inverse problem in advanced microscopy with over 750,000 parameters, demonstrating its relevance to real scientific applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript extends compositional score matching (CSM) for amortized Bayesian inference to large-scale hierarchical models by introducing an error-damping estimator that mitigates stability issues during aggregation of many data points. It reports numerical stability verification on controlled benchmarks with up to 100,000 points, competitive performance to direct ABI baselines on a hierarchical AR model while using fewer than one full simulation for larger sizes, and a demonstration on an advanced microscopy inverse problem with over 750,000 parameters.
Significance. If the error-damping estimator is shown to preserve statistical accuracy in addition to numerical stability, the approach could enable practical amortized inference for complex hierarchical models in data-intensive scientific domains such as microscopy, offering substantial computational savings over direct methods for problems with hundreds of thousands of parameters.
major comments (2)
- [Abstract / large-scale inverse problem demonstration] Abstract and microscopy demonstration: the central claim that the error-damping estimator addresses stability while preserving correctness is load-bearing, yet the 750,000-parameter inverse problem is presented only as a relevance demonstration without reported metrics (posterior mean error, coverage, or comparison to a subsampled direct baseline) under the actual noise model of the microscopy data.
- [Numerical stability verification and hierarchical AR evaluation] Benchmark and AR model sections: the effect of the error-damping strength on the statistical properties of the aggregated posterior (bias, variance, or calibration) is not explicitly quantified, leaving open the possibility that stability gains come at the cost of under-dispersion or systematic bias when diffusion-model approximations encounter non-synthetic likelihoods.
minor comments (2)
- [Methods] Clarify how the free parameter for error-damping strength is selected or tuned across experiments, including any sensitivity analysis.
- [Figures] Ensure all figure captions explicitly state the number of data points, parameter count, and whether results are on synthetic or real data.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the scope and limitations of our work. We address each major comment below and indicate where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract / large-scale inverse problem demonstration] Abstract and microscopy demonstration: the central claim that the error-damping estimator addresses stability while preserving correctness is load-bearing, yet the 750,000-parameter inverse problem is presented only as a relevance demonstration without reported metrics (posterior mean error, coverage, or comparison to a subsampled direct baseline) under the actual noise model of the microscopy data.
Authors: We agree that the microscopy demonstration is presented without quantitative metrics such as posterior mean error or coverage, and that a direct comparison to a subsampled baseline is absent. This section is explicitly framed as a relevance demonstration to show applicability to a real scientific problem at a scale where direct methods become intractable. We will revise the abstract and the demonstration section to more clearly state these limitations and emphasize that statistical accuracy claims rest on the controlled benchmarks and hierarchical AR evaluations rather than the microscopy example. revision: partial
-
Referee: [Numerical stability verification and hierarchical AR evaluation] Benchmark and AR model sections: the effect of the error-damping strength on the statistical properties of the aggregated posterior (bias, variance, or calibration) is not explicitly quantified, leaving open the possibility that stability gains come at the cost of under-dispersion or systematic bias when diffusion-model approximations encounter non-synthetic likelihoods.
Authors: We acknowledge that the manuscript does not include an explicit sensitivity analysis varying the error-damping strength and reporting its effects on bias, variance, or calibration. The existing evaluations show numerical stability up to 100,000 points and competitive performance versus direct ABI baselines on the AR model. We will add a targeted analysis (in the main text or supplement) that quantifies these statistical properties across a range of damping strengths on the synthetic benchmarks to address this concern directly. revision: yes
Circularity Check
No circularity: new estimator introduced and externally verified
full rationale
The paper presents a new error-damping estimator as an extension to compositional score matching (CSM) for handling large numbers of data points in hierarchical amortized Bayesian inference. Numerical stability is checked on a controlled benchmark with up to 100,000 points, performance is compared to direct ABI baselines on a hierarchical AR model, and relevance is shown via demonstration on a 750k-parameter microscopy inverse problem. No derivation step reduces by construction to a fitted quantity, self-citation chain, or renamed input; the central technical contribution is independently motivated and tested against external benchmarks rather than being equivalent to its own assumptions or prior fitted values.
Axiom & Free-Parameter Ledger
free parameters (1)
- error-damping strength
axioms (1)
- domain assumption Diffusion models provide sufficiently accurate score estimates for the sub-problems that arise in compositional score matching.
Forward citations
Cited by 2 Pith papers
-
Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference
Derives Wasserstein bounds and explicit hyperparameter tuning rules for annealed Langevin dynamics in compositional score-based SBI, proving Linhart et al. (2026) allows larger steps and fewer total steps than Geffner...
-
Tokenised Flow Matching for Hierarchical Simulation Based Inference
TFMPE combines likelihood factorisation with tokenised flow matching to enable efficient hierarchical SBI from single-site simulations, producing well-calibrated posteriors at lower computational cost on a new benchma...
Reference graph
Works this paper leans on
- [1]
-
[2]
URL https://doi.org/10.1080/ 01621459.2017.1307116
ISSN 0162-1459. doi: 10.1080/01621459.2017.1285773. J. Boelts, M. Deistler, M. Gloeckler, Á. Tejero-Cantero, J.-M. Lueckmann, G. Moss, P. Steinbach, T. Moreau, F. Muratore, J. Linhart, et al. sbi reloaded: a toolkit for simulation-based inference workflows.Journal of Open Source Software, 10(108):7754,
-
[3]
11 Published as a conference paper at ICLR 2026 B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language.Journal of statistical software, 76:1–32,
work page 2026
- [4]
-
[5]
doi: 10.1111/j.1467-9868.2007.00587.x. D. Habermann, M. Schmitt, L. Kühmichel, A. Bulling, S. T. Radev, and P.-C. Bürkner. Amortized bayesian multilevel models.CoRR, abs/2408.13230,
-
[6]
Gotta go fast when generating data with score-based models,
A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas. Gotta go fast when generating data with score-based models.arXiv preprint arXiv:2105.14080,
- [7]
-
[8]
12 Published as a conference paper at ICLR 2026 J. Linhart, G. Cardoso, A. Gramfort, S. L. Corff, and P. L. C. Rodrigues. Diffusion posterior sampling for simulation-based inference in tall data settings.Transactions on Machine Learning Research,
work page 2026
-
[9]
13 Published as a conference paper at ICLR 2026 J. T. Smith, R. Yao, N. Sinsuebphon, A. Rudkouskaya, N. Un, J. Mazurkiewicz, M. Barroso, P. Yan, and X. Intes. Fast fit-free analysis of fluorescence lifetime imaging via deep learning.Proceedings of the national academy of sciences, 116(48):24019–24030,
work page 2026
-
[10]
doi: 10.1364/opticaopen.28094186.v1. M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola. Deep sets. Advances in neural information processing systems, 30,
-
[11]
Y . Zhang and L. Mikelsons. Solving stochastic inverse problems with stochastic BayesFlow. In 2023 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), pages 966–972,
work page 2023
-
[12]
(2021b): dθt =f(θ t, t) dt+g(t) dW t
14 Published as a conference paper at ICLR 2026 A APPENDIX A.1 STOCHASTIC DIFFERENTIAL EQUATION FORMULATION OF THE DIFFUSION PROCESS The forward diffusion process for t∈[0,1] can be specified as a stochastic differential equation Song et al. (2021b): dθt =f(θ t, t) dt+g(t) dW t. For a known variance-preserving process, the drift and diffusion coefficients...
work page 2026
-
[13]
and ReLU activations, projecting to the final output dimension. • Time series summary network:For structured input data such as time series (as in the FLI application), we use a hybrid convolutional–recurrent architecture. The model begins with a stack of 1D convolutional layers followed by a skipping recurrent path, as implemented in (Zhang and Mikelsons...
work page 2023
-
[14]
We parameterize our score models to predict the more stable velocity ˆvt :=α tϵ−σ tθt, and then transform the output to noise ˆϵt, as it has been shown that this parameterization is more stable for all t, whereas noise-prediction becomes harder for t close to 0 where the signal increases and noise decreases Salimans and Ho (2022). Furthermore, we conditio...
work page 2022
-
[15]
We observe {Yj}J j=1 with varying J and compute the posterior p(η| {Y i}J j=1). Given a normal prior for η, η∼ N(0|σ 2I), the posterior is also Gaussian, and we can calculate it analytically: p(η| {Y j}J j=1)∝exp − 1 2 (η−µ J)⊤Σ−1 J (η−µ J) , whereµ J = 1 J+1 PJ j=1 Yj andΣ −1 J = J+1 σ2 I. Here, we did not employ a summary network. 18 Published as a conf...
work page 2026
-
[16]
We used 4 parallel chains, each generating 1,000 samples with default settings in Stan
performs better on non-centered parameterizations (Betancourt and Girolami, 2015). We used 4 parallel chains, each generating 1,000 samples with default settings in Stan. Here, we do not employ a summary network. For the direct hierarchical ABI methods (Heinrich et al., 2024; Habermann et al., 2024), we employ • ABI-NF: Normalizing flow with 2 coupling la...
work page 2015
-
[17]
Convergence is achieved only for the smallest dataset
101 103 Data Size 101 102 103 104 Number of Steps Max Steps 101 103 Data Size 0.0 0.5 1.0RMSE Global 101 103 Data Size 0.0 0.2 0.4 Calibration Error Global 101 103 Data Size 0.0 0.5 1.0Contraction Global 101 103 Data Size 0.0 0.5 1.0RMSE Local 101 103 Data Size 0.0 0.5 1.0Contraction Local 1 10 16 100 256 4096 16384 (b) Varying mini-batch sizes. Convergen...
work page 2026
-
[18]
The real data were also normalized to 1 on a pixel-wise level. Instrument response function (IRF)The emitted signals are recorded using multiple instruments (detectors, electronics, etc.) which have a characteristic response E(t) to an instantaneous signal δ(t) (e.g., a single photon). The recorded signals from the T -periodic emitted signal can be writte...
work page 2025
-
[19]
Here, we employed a time-series summary network. For comparison, we also trained a diffusion model of the same size as ours on the flat model using the same prior and simulation budget, but only targeting the local per pixel parameters without conditioning on global parameters. DataAU565 (HER2+ human breast carcinoma) cells, incubated for 24h with 20 µg/m...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.