Amortized Energy-Based Bayesian Inference
Pith reviewed 2026-05-20 19:53 UTC · model grok-4.3
The pith
A transport map learned from joint samples approximates posteriors for repeated Bayesian inference in nonlinear inverse problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an observation-dependent transport map, obtained by minimizing the average energy distance to the true posterior pushforward, can be learned from joint samples alone and then used to generate approximate posterior samples for new observations in both finite- and infinite-dimensional nonlinear inverse problems.
What carries the argument
The learned observation-dependent transport map, which pushes a reference measure forward to approximate the posterior and is trained via the averaged energy-distance objective.
If this is right
- The learned map enables rapid posterior sampling for new observations without resolving a full inference problem each time.
- The approach works in likelihood-free settings requiring only joint samples from parameters and observations.
- Parameterization via Cameron-Martin perturbations ensures the map preserves absolute continuity with respect to Gaussian priors in function space.
- Neural operator representations allow the method to handle infinite-dimensional PDE-constrained inverse problems.
- Posterior structure including multimodality and dominant modes is captured in the learned approximations.
Where Pith is reading between the lines
- If the energy-distance minimization succeeds, the method could be applied to sequential data assimilation where observations arrive over time.
- Similar amortization ideas might extend to other sampling-based inference tasks beyond inverse problems.
- Replacing energy distance with alternative metrics could be explored for improved performance in specific applications.
- Validation on additional inverse problems would test the generality of the transport map parameterization.
Load-bearing premise
That the transport map obtained by minimizing the averaged energy-distance objective provides a sufficiently close approximation to the true posterior for practical use in the target applications.
What would settle it
Running independent MCMC on a new observation and comparing the resulting samples or statistics to those generated by the trained transport map; large discrepancies would indicate the learned approximation is inaccurate.
read the original abstract
We consider amortized Bayesian inference for nonlinear inverse problems in settings where only samples from the joint distribution of parameters and observations are available. Classical methods such as Markov chain Monte Carlo require solving a new inference problem for each observation, which can be computationally prohibitive when inference must be repeated many times. We propose a transport-based approach that learns an observation-dependent map pushing forward a reference measure to approximate the posterior distribution. The map is trained by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward. This formulation is likelihood-free, requiring only joint samples, and avoids density evaluation, invertibility constraints, and Jacobian determinant computations. For function-space inverse problems with Gaussian priors, we parameterize the transport map as the identity plus a perturbation in the Cameron-Martin space of the prior, preserving absolute continuity with respect to the prior. In infinite-dimensional settings, the map is represented using neural operators. We illustrate the method on a finite-dimensional nonlinear inverse problem and two PDE-constrained inverse problems arising in porous medium flow and seismic inversion. The results show that the learned transport captures posterior structure, including multimodality and dominant modes, while enabling fast posterior sampling for new observations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an amortized Bayesian inference framework for nonlinear inverse problems that learns an observation-dependent transport map pushing a reference measure forward to approximate the posterior. The map is trained by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward, using only joint samples from p(θ, y). The approach is likelihood-free, avoids density evaluations and Jacobian computations, and for function-space problems with Gaussian priors parameterizes the map as the identity plus a Cameron-Martin perturbation, represented via neural operators in infinite dimensions. The method is illustrated on one finite-dimensional nonlinear inverse problem and two PDE-constrained problems (porous medium flow and seismic inversion), with claims that the learned map captures multimodality and dominant modes while enabling fast sampling for new observations.
Significance. If the training procedure can be made executable and the resulting approximations are shown to be accurate, the work would offer a practical advance in amortized inference for settings where repeated posterior sampling is needed and classical MCMC is too slow. The energy-distance formulation and the structure-preserving parameterization for infinite-dimensional problems are technically interesting, and the qualitative demonstrations on multimodality provide initial evidence of utility, though quantitative validation would strengthen the case.
major comments (2)
- [Training Objective (abstract and §3)] The central training procedure (minimization of the averaged energy-distance objective between p(θ|y) and the learned pushforward) is described as requiring only joint samples, yet no explicit estimator, resampling scheme, or reformulation is provided that would allow computation of the energy distance for fixed y. Joint samples from p(θ, y) typically yield at most one θ per distinct y in continuous settings, which is insufficient to estimate the posterior or the distance without additional machinery; this renders the stated objective non-executable as described and directly undermines the likelihood-free claim.
- [Numerical Experiments (§5)] The numerical results rely exclusively on qualitative visualizations of captured multimodality and dominant modes without any quantitative error metrics, convergence diagnostics, or comparisons to ground-truth posteriors (e.g., Wasserstein distances, effective sample sizes, or posterior coverage). This absence makes it impossible to assess whether the learned map provides a sufficiently accurate approximation for the target inverse-problem applications.
minor comments (2)
- [§2] Notation for the energy distance and the averaging over observations could be introduced more explicitly with an equation number to improve readability.
- [Abstract] The abstract states results on 'three example problems' while the body describes one finite-dimensional case plus two PDE-constrained cases; a brief clarifying sentence would avoid minor confusion.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive report. We address each major comment below and describe the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Training Objective (abstract and §3)] The central training procedure (minimization of the averaged energy-distance objective between p(θ|y) and the learned pushforward) is described as requiring only joint samples, yet no explicit estimator, resampling scheme, or reformulation is provided that would allow computation of the energy distance for fixed y. Joint samples from p(θ, y) typically yield at most one θ per distinct y in continuous settings, which is insufficient to estimate the posterior or the distance without additional machinery; this renders the stated objective non-executable as described and directly undermines the likelihood-free claim.
Authors: We agree that the manuscript requires a more explicit description of the Monte Carlo estimator used to approximate the averaged energy-distance objective. The current text states that the approach requires only joint samples but does not detail the finite-sample procedure, including how expectations are formed over batches of observations and how multiple pushforward samples are drawn for each fixed y. In the revised manuscript we will add a dedicated subsection in §3 that presents the empirical estimator, specifies the batching strategy over joint samples, and clarifies that the energy-distance terms involving the learned map are estimated by repeated sampling from the reference measure through the map while the cross terms are estimated from the available joint pairs. We will also note any practical requirements for generating sufficiently many map samples per observation to obtain stable estimates. revision: yes
-
Referee: [Numerical Experiments (§5)] The numerical results rely exclusively on qualitative visualizations of captured multimodality and dominant modes without any quantitative error metrics, convergence diagnostics, or comparisons to ground-truth posteriors (e.g., Wasserstein distances, effective sample sizes, or posterior coverage). This absence makes it impossible to assess whether the learned map provides a sufficiently accurate approximation for the target inverse-problem applications.
Authors: We acknowledge that the present numerical section emphasizes qualitative illustrations of multimodality and mode capture. While these visualizations are useful for demonstrating the method’s qualitative behavior on the chosen examples, we agree that quantitative metrics would strengthen the evaluation. In the revised manuscript we will augment §5 with quantitative assessments: Wasserstein distances to reference posteriors on the finite-dimensional nonlinear problem (where ground truth can be obtained by long-run MCMC), posterior coverage probabilities, and effective sample size comparisons against independent MCMC runs for the PDE-constrained examples. We will also report training and inference wall-clock times to quantify the amortization benefit. revision: yes
Circularity Check
No circularity: training objective and transport map defined directly from joint samples without reduction to inputs
full rationale
The paper proposes learning an observation-dependent transport map by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward, explicitly using only joint samples from p(θ, y). This objective is stated as the training criterion without any reduction to a fitted parameter renamed as a prediction, self-definitional loop, or load-bearing self-citation for uniqueness. The infinite-dimensional parameterization (identity plus Cameron-Martin perturbation, neural operators) is given explicitly as an implementation choice. Claims about capturing multimodality are presented as empirical outcomes of the method rather than tautological derivations. The derivation chain is self-contained, with the method's executability resting on external estimation of the energy distance from joint samples rather than any internal circular equivalence.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption A transport map exists that pushes a reference measure to the target posterior
- domain assumption The energy-distance objective can be minimized to yield a useful posterior approximation
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The map is trained by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward... likelihood-free, requiring only joint samples
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
T_θ(u; y) = u + C^{1/2} S_θ(u; y) ... perturbation lies in the Cameron–Martin space
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
S. Arridge, P. Maass, O. ¨Oktem, and C.-B. Sch ¨onlieb, Solving inverse problems using data-driven models , Acta Numerica, 28 (2019), pp. 1–174, https://doi.org/10.1017/S0962492919000059, https://www.cambridge.org/core/ journals/acta-numerica/article/solving-inverse-problems-using-datadriven-models/ CE5B3725869AEAF46E04874115B0AB15?utm source=chatgpt.com ...
-
[2]
E. Bach, R. Baptista, D. Sanz-Alonso, and A. Stuart , Machine Learning for Inverse Problems and Data Assimilation , Oct. 2025, https://doi.org/10.48550/arXiv.2410.10523, http://arxiv.org/abs/2410.10523 (accessed 2025-11-14). arXiv:2410.10523 [stat]
-
[3]
R. Baptista, B. Hosseini, N. B. Kovachki, and Y. M. Marzouk , Conditional sampling with monotone GANs: From generative models to likelihood-free inference, SIAM/ASA Journal on Uncertainty Quantification, 12 (2024), pp. 868–900
work page 2024
-
[4]
R. Baptista, Y. Marzouk, and O. Zahm , On the representation and learning of mono- tone triangular transport maps , Foundations of Computational Mathematics, 24 (2024), pp. 2063–2108
work page 2024
-
[5]
R. Baptista, A.-A. Pooladian, M. Brennan, Y. Marzouk, and J. Niles-Weed , Condi- tional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps, in International Conference on Artificial Intelligence and Statis- tics, PMLR, 2025, pp. 4807–4815
work page 2025
-
[6]
Bogachev , Gaussian Measures, vol
V. Bogachev , Gaussian Measures, vol. 62 of Mathematical Surveys and Monographs, Ameri- can Mathematical Society, Providence, Rhode Island, Sept. 1998, https://doi.org/10.1090/ surv/062, https://www.ams.org/surv/062 (accessed 2026-04-28)
work page 1998
-
[7]
V. I. Bogachev, A. V. Kolesnikov, and K. V. Medvedev , Triangular trans- formations of measures , Sbornik: Mathematics, 196 (2005), p. 309, https: //doi.org/10.1070/SM2005v196n03ABEH000882, https://iopscience.iop.org/article/ 10.1070/SM2005v196n03ABEH000882/meta (accessed 2025-09-03)
- [8]
-
[9]
L. Cao, J. Chen, M. Brennan, T. O’Leary-Roseberry, Y. Marzouk, and O. Ghattas , LazyDINO: Fast, Scalable, and Efficiently Amortized Bayesian Inversion via Structure- Exploiting and Surrogate-Driven Measure Transport , Journal of Machine Learning Re- search, 27 (2026), pp. 1–71, http://jmlr.org/papers/v27/25-0858.html (accessed 2026-04- 08)
work page 2026
-
[10]
S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White , MCMC Meth- ods for Functions: Modifying Old Algorithms to Make Them Faster , Sta- tistical Science, 28 (2013), pp. 424–446, https://doi.org/10.1214/13-STS421, https://projecteuclid.org/journals/statistical-science/volume-28/issue-3/ MCMC-Methods-for-Functions--Modifying-Old-Algorithms-to-Make/10.12...
-
[11]
M. Dashti and A. M. Stuart , The Bayesian Approach to Inverse Problems , in Hand- book of Uncertainty Quantification, Springer, Cham, 2017, pp. 311–428, https://doi.org/ 10.1007/978-3-319-12385-1 7, https://link.springer.com/rwe/10.1007/978-3-319-12385-1 7 (accessed 2025-07-23)
-
[12]
X. Huan, J. Jagalur, and Y. Marzouk , Optimal experimental design: For- mulations and computations , Acta Numerica, 33 (2024), pp. 715–840, https: //doi.org/10.1017/S0962492924000023, https://www.cambridge.org/core/journals/ acta-numerica/article/optimal-experimental-design-formulations-and-computations/ AMORTIZED ENERGY-BASED BAYESIAN INFERENCE 25 38BBD0...
-
[13]
S. Karumuri and I. Bilionis , Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides , Journal of Computational Physics, 511 (2024), p. 113117, https://doi.org/10.1016/j.jcp.2024.113117, http://arxiv. org/abs/2305.20004 (accessed 2026-05-03). arXiv:2305.20004 [stat]
-
[14]
H. Kaveh, J. P. A vouac, and A. M. Stuart , Data assimilation in machine-learned reduced-order model of chaotic earthquake sequences , Geophysical Journal International, 244 (2026), p. ggaf518, https://doi.org/10.1093/gji/ggaf518, https://doi.org/10.1093/gji/ ggaf518 (accessed 2026-04-08)
-
[15]
H. Kaveh, P. Batlle, M. Acosta, P. Kulkarni, S. J. Bourne, and J. P. A vouac , Induced Seismicity Forecasting with Uncertainty Quantification: Application to the Groningen Gas Field, Seismological Research Letters, 95 (2023), pp. 773–790, https://doi.org/10.1785/ 0220230179, https://doi.org/10.1785/0220230179 (accessed 2025-04-04)
- [16]
-
[17]
Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar , Fourier Neural Operator for Parametric Partial Differential Equations , May 2021, https://doi.org/10.48550/arXiv.2010.08895, http://arxiv.org/abs/2010.08895 (accessed 2024-03-12). arXiv:2010.08895 [cs, math]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.08895 2021
-
[18]
An introduction to sampling via measure transport
Y. Marzouk, T. Moselhy, M. Parno, and A. Spantini , An introduction to sampling via measure transport, 2016, pp. 1–41, https://doi.org/10.1007/978-3-319-11259-6 23-1, http: //arxiv.org/abs/1602.05023 (accessed 2026-04-27). arXiv:1602.05023 [stat]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/978-3-319-11259-6 2016
-
[19]
T. A. E. Moselhy and Y. M. Marzouk , Bayesian Inference with Optimal Maps , Journal of Computational Physics, 231 (2012), pp. 7815–7850, https://doi.org/10.1016/j.jcp.2012.07. 022, http://arxiv.org/abs/1109.1516 (accessed 2026-04-27). arXiv:1109.1516 [stat]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.jcp.2012.07 2012
-
[20]
H. Mousavi and J. D. Eldredge , Bayesian Inference for Estimating Heat Sources Through Temperature Assimilation, ASME Journal of Heat and Mass Transfer, 147 (2024), https: //doi.org/10.1115/1.4066749, https://doi.org/10.1115/1.4066749 (accessed 2026-04-08)
-
[21]
G. Papamakarios , Neural density estimation and likelihood-free inference , arXiv preprint arXiv:1910.13233, (2019)
-
[22]
Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows
G. Papamakarios, D. C. Sterratt, and I. Murray , Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , Jan. 2019, https://doi. org/10.48550/arXiv.1805.07226, http://arxiv.org/abs/1805.07226 (accessed 2026-04-28). arXiv:1805.07226 [stat]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1805.07226 2019
- [23]
-
[24]
A. M. Stuart , Inverse problems: A Bayesian perspective , Acta Numerica, 19 (2010), pp. 451–559, https://doi.org/10.1017/S0962492910000061, https://www.cambridge. org/core/journals/acta-numerica/article/abs/inverse-problems-a-bayesian-perspective/ 587A3A0D480A1A7C2B1B284BCEDF7E23 (accessed 2026-04-28)
-
[25]
A. Taghvaei and B. Hosseini , An optimal transport formulation of Bayes’ law for nonlinear filtering algorithms, in 2022 IEEE 61st Conference on Decision and Control (CDC), IEEE, 2022, pp. 6608–6613
work page 2022
-
[26]
Inverse Problem Theory and Methods for Model Parameter Estimation
A. Tarantola , Inverse Problem Theory and Methods for Model Parameter Estimation , Other Titles in Applied Mathematics, Society for Industrial and Applied Mathematics, Jan. 2005, https://doi.org/10.1137/1.9780898717921, https://epubs.siam.org/doi/book/ 10.1137/1.9780898717921 (accessed 2026-04-27)
-
[27]
C. Villani , Optimal Transport, vol. 338 of Grundlehren der mathematischen Wissenschaften, Springer, Berlin, Heidelberg, 2009, https://doi.org/10.1007/978-3-540-71050-9, http:// link.springer.com/10.1007/978-3-540-71050-9 (accessed 2026-04-27)
-
[28]
J. Wildberger, M. Dax, S. Buchholz, S. Green, J. H. Macke, and B. Sch ¨olkopf, Flow matching for scalable simulation-based inference, Advances in Neural Information Process- ing Systems, 36 (2023), pp. 16837–16864
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.