Amortized Energy-Based Bayesian Inference

Andrew M. Stuart; Hojjat Kaveh; Ricardo Baptista

arxiv: 2605.15407 · v2 · pith:4HXAPRVFnew · submitted 2026-05-14 · 🧮 math.NA · cs.AI· cs.NA

Amortized Energy-Based Bayesian Inference

Hojjat Kaveh , Ricardo Baptista , Andrew M. Stuart This is my paper

Pith reviewed 2026-05-20 19:53 UTC · model grok-4.3

classification 🧮 math.NA cs.AIcs.NA

keywords amortized inferencetransport mapsBayesian inverse problemsenergy distanceneural operatorsposterior approximationlikelihood free

0 comments

The pith

A transport map learned from joint samples approximates posteriors for repeated Bayesian inference in nonlinear inverse problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an amortized approach to Bayesian inference for nonlinear inverse problems, where the same inference task must be solved for many different observations. Rather than using MCMC to solve a new problem each time, it learns a map that takes an observation and pushes a reference distribution to approximate the corresponding posterior. Training minimizes an averaged energy-distance objective using only samples from the joint distribution of parameters and observations, making the method likelihood-free. In function-space settings with Gaussian priors, the map is parameterized as the identity plus a perturbation in the Cameron-Martin space to maintain absolute continuity, with neural operators used for the infinite-dimensional representation. Demonstrations on a finite-dimensional example and two PDE inverse problems show that the map recovers multimodal structure and supports fast sampling for unseen observations.

Core claim

The central claim is that an observation-dependent transport map, obtained by minimizing the average energy distance to the true posterior pushforward, can be learned from joint samples alone and then used to generate approximate posterior samples for new observations in both finite- and infinite-dimensional nonlinear inverse problems.

What carries the argument

The learned observation-dependent transport map, which pushes a reference measure forward to approximate the posterior and is trained via the averaged energy-distance objective.

If this is right

The learned map enables rapid posterior sampling for new observations without resolving a full inference problem each time.
The approach works in likelihood-free settings requiring only joint samples from parameters and observations.
Parameterization via Cameron-Martin perturbations ensures the map preserves absolute continuity with respect to Gaussian priors in function space.
Neural operator representations allow the method to handle infinite-dimensional PDE-constrained inverse problems.
Posterior structure including multimodality and dominant modes is captured in the learned approximations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the energy-distance minimization succeeds, the method could be applied to sequential data assimilation where observations arrive over time.
Similar amortization ideas might extend to other sampling-based inference tasks beyond inverse problems.
Replacing energy distance with alternative metrics could be explored for improved performance in specific applications.
Validation on additional inverse problems would test the generality of the transport map parameterization.

Load-bearing premise

That the transport map obtained by minimizing the averaged energy-distance objective provides a sufficiently close approximation to the true posterior for practical use in the target applications.

What would settle it

Running independent MCMC on a new observation and comparing the resulting samples or statistics to those generated by the trained transport map; large discrepancies would indicate the learned approximation is inaccurate.

read the original abstract

We consider amortized Bayesian inference for nonlinear inverse problems in settings where only samples from the joint distribution of parameters and observations are available. Classical methods such as Markov chain Monte Carlo require solving a new inference problem for each observation, which can be computationally prohibitive when inference must be repeated many times. We propose a transport-based approach that learns an observation-dependent map pushing forward a reference measure to approximate the posterior distribution. The map is trained by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward. This formulation is likelihood-free, requiring only joint samples, and avoids density evaluation, invertibility constraints, and Jacobian determinant computations. For function-space inverse problems with Gaussian priors, we parameterize the transport map as the identity plus a perturbation in the Cameron-Martin space of the prior, preserving absolute continuity with respect to the prior. In infinite-dimensional settings, the map is represented using neural operators. We illustrate the method on a finite-dimensional nonlinear inverse problem and two PDE-constrained inverse problems arising in porous medium flow and seismic inversion. The results show that the learned transport captures posterior structure, including multimodality and dominant modes, while enabling fast posterior sampling for new observations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The transport-map approach with energy-distance training targets a real bottleneck in repeated inverse-problem inference, but the objective's dependence on posterior samples from joint data alone needs a concrete estimator to be executable.

read the letter

The paper learns an observation-dependent transport map that pushes a reference measure to an approximate posterior by minimizing an averaged energy-distance objective, all from joint samples and without likelihoods or Jacobians. For Gaussian priors in function space it adds a Cameron-Martin perturbation and represents the map with neural operators. That combination is the main novelty and it is aimed squarely at PDE-constrained problems where you have to do inference many times, such as porous-medium flow or seismic inversion. The examples show the map recovering multimodality and dominant modes at least qualitatively, which is useful for those domains and gives fast sampling once the map is trained.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an amortized Bayesian inference framework for nonlinear inverse problems that learns an observation-dependent transport map pushing a reference measure forward to approximate the posterior. The map is trained by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward, using only joint samples from p(θ, y). The approach is likelihood-free, avoids density evaluations and Jacobian computations, and for function-space problems with Gaussian priors parameterizes the map as the identity plus a Cameron-Martin perturbation, represented via neural operators in infinite dimensions. The method is illustrated on one finite-dimensional nonlinear inverse problem and two PDE-constrained problems (porous medium flow and seismic inversion), with claims that the learned map captures multimodality and dominant modes while enabling fast sampling for new observations.

Significance. If the training procedure can be made executable and the resulting approximations are shown to be accurate, the work would offer a practical advance in amortized inference for settings where repeated posterior sampling is needed and classical MCMC is too slow. The energy-distance formulation and the structure-preserving parameterization for infinite-dimensional problems are technically interesting, and the qualitative demonstrations on multimodality provide initial evidence of utility, though quantitative validation would strengthen the case.

major comments (2)

[Training Objective (abstract and §3)] The central training procedure (minimization of the averaged energy-distance objective between p(θ|y) and the learned pushforward) is described as requiring only joint samples, yet no explicit estimator, resampling scheme, or reformulation is provided that would allow computation of the energy distance for fixed y. Joint samples from p(θ, y) typically yield at most one θ per distinct y in continuous settings, which is insufficient to estimate the posterior or the distance without additional machinery; this renders the stated objective non-executable as described and directly undermines the likelihood-free claim.
[Numerical Experiments (§5)] The numerical results rely exclusively on qualitative visualizations of captured multimodality and dominant modes without any quantitative error metrics, convergence diagnostics, or comparisons to ground-truth posteriors (e.g., Wasserstein distances, effective sample sizes, or posterior coverage). This absence makes it impossible to assess whether the learned map provides a sufficiently accurate approximation for the target inverse-problem applications.

minor comments (2)

[§2] Notation for the energy distance and the averaging over observations could be introduced more explicitly with an equation number to improve readability.
[Abstract] The abstract states results on 'three example problems' while the body describes one finite-dimensional case plus two PDE-constrained cases; a brief clarifying sentence would avoid minor confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive report. We address each major comment below and describe the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Training Objective (abstract and §3)] The central training procedure (minimization of the averaged energy-distance objective between p(θ|y) and the learned pushforward) is described as requiring only joint samples, yet no explicit estimator, resampling scheme, or reformulation is provided that would allow computation of the energy distance for fixed y. Joint samples from p(θ, y) typically yield at most one θ per distinct y in continuous settings, which is insufficient to estimate the posterior or the distance without additional machinery; this renders the stated objective non-executable as described and directly undermines the likelihood-free claim.

Authors: We agree that the manuscript requires a more explicit description of the Monte Carlo estimator used to approximate the averaged energy-distance objective. The current text states that the approach requires only joint samples but does not detail the finite-sample procedure, including how expectations are formed over batches of observations and how multiple pushforward samples are drawn for each fixed y. In the revised manuscript we will add a dedicated subsection in §3 that presents the empirical estimator, specifies the batching strategy over joint samples, and clarifies that the energy-distance terms involving the learned map are estimated by repeated sampling from the reference measure through the map while the cross terms are estimated from the available joint pairs. We will also note any practical requirements for generating sufficiently many map samples per observation to obtain stable estimates. revision: yes
Referee: [Numerical Experiments (§5)] The numerical results rely exclusively on qualitative visualizations of captured multimodality and dominant modes without any quantitative error metrics, convergence diagnostics, or comparisons to ground-truth posteriors (e.g., Wasserstein distances, effective sample sizes, or posterior coverage). This absence makes it impossible to assess whether the learned map provides a sufficiently accurate approximation for the target inverse-problem applications.

Authors: We acknowledge that the present numerical section emphasizes qualitative illustrations of multimodality and mode capture. While these visualizations are useful for demonstrating the method’s qualitative behavior on the chosen examples, we agree that quantitative metrics would strengthen the evaluation. In the revised manuscript we will augment §5 with quantitative assessments: Wasserstein distances to reference posteriors on the finite-dimensional nonlinear problem (where ground truth can be obtained by long-run MCMC), posterior coverage probabilities, and effective sample size comparisons against independent MCMC runs for the PDE-constrained examples. We will also report training and inference wall-clock times to quantify the amortization benefit. revision: yes

Circularity Check

0 steps flagged

No circularity: training objective and transport map defined directly from joint samples without reduction to inputs

full rationale

The paper proposes learning an observation-dependent transport map by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward, explicitly using only joint samples from p(θ, y). This objective is stated as the training criterion without any reduction to a fitted parameter renamed as a prediction, self-definitional loop, or load-bearing self-citation for uniqueness. The infinite-dimensional parameterization (identity plus Cameron-Martin perturbation, neural operators) is given explicitly as an implementation choice. Claims about capturing multimodality are presented as empirical outcomes of the method rather than tautological derivations. The derivation chain is self-contained, with the method's executability resting on external estimation of the energy distance from joint samples rather than any internal circular equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard optimal-transport and Bayesian-inference assumptions plus the modeling choice of representing the map as identity plus Cameron-Martin perturbation; no new entities are postulated.

axioms (2)

domain assumption A transport map exists that pushes a reference measure to the target posterior
Invoked when the method is introduced as learning an observation-dependent map
domain assumption The energy-distance objective can be minimized to yield a useful posterior approximation
Central to the training formulation described in the abstract

pith-pipeline@v0.9.0 · 5737 in / 1329 out tokens · 40059 ms · 2026-05-20T19:53:38.016309+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The map is trained by minimizing an averaged energy-distance objective between the true posterior and the learned pushforward... likelihood-free, requiring only joint samples
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

T_θ(u; y) = u + C^{1/2} S_θ(u; y) ... perturbation lies in the Cameron–Martin space

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 4 internal anchors

[1]

Arridge, P

S. Arridge, P. Maass, O. ¨Oktem, and C.-B. Sch ¨onlieb, Solving inverse problems using data-driven models , Acta Numerica, 28 (2019), pp. 1–174, https://doi.org/10.1017/S0962492919000059, https://www.cambridge.org/core/ journals/acta-numerica/article/solving-inverse-problems-using-datadriven-models/ CE5B3725869AEAF46E04874115B0AB15?utm source=chatgpt.com ...

work page doi:10.1017/s0962492919000059 2019
[2]

E. Bach, R. Baptista, D. Sanz-Alonso, and A. Stuart , Machine Learning for Inverse Problems and Data Assimilation , Oct. 2025, https://doi.org/10.48550/arXiv.2410.10523, http://arxiv.org/abs/2410.10523 (accessed 2025-11-14). arXiv:2410.10523 [stat]

work page doi:10.48550/arxiv.2410.10523 2025
[3]

Baptista, B

R. Baptista, B. Hosseini, N. B. Kovachki, and Y. M. Marzouk , Conditional sampling with monotone GANs: From generative models to likelihood-free inference, SIAM/ASA Journal on Uncertainty Quantification, 12 (2024), pp. 868–900

work page 2024
[4]

Baptista, Y

R. Baptista, Y. Marzouk, and O. Zahm , On the representation and learning of mono- tone triangular transport maps , Foundations of Computational Mathematics, 24 (2024), pp. 2063–2108

work page 2024
[5]

Baptista, A.-A

R. Baptista, A.-A. Pooladian, M. Brennan, Y. Marzouk, and J. Niles-Weed , Condi- tional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps, in International Conference on Artificial Intelligence and Statis- tics, PMLR, 2025, pp. 4807–4815

work page 2025
[6]

Bogachev , Gaussian Measures, vol

V. Bogachev , Gaussian Measures, vol. 62 of Mathematical Surveys and Monographs, Ameri- can Mathematical Society, Providence, Rhode Island, Sept. 1998, https://doi.org/10.1090/ surv/062, https://www.ams.org/surv/062 (accessed 2026-04-28)

work page 1998
[7]

V. I. Bogachev, A. V. Kolesnikov, and K. V. Medvedev , Triangular trans- formations of measures , Sbornik: Mathematics, 196 (2005), p. 309, https: //doi.org/10.1070/SM2005v196n03ABEH000882, https://iopscience.iop.org/article/ 10.1070/SM2005v196n03ABEH000882/meta (accessed 2025-09-03)

work page doi:10.1070/sm2005v196n03abeh000882 2005
[8]

Brooks, A

S. Brooks, A. Gelman, G. Jones, and X.-L. Meng , Handbook of Markov Chain Monte Carlo, CRC press, 2011

work page 2011
[9]

L. Cao, J. Chen, M. Brennan, T. O’Leary-Roseberry, Y. Marzouk, and O. Ghattas , LazyDINO: Fast, Scalable, and Efficiently Amortized Bayesian Inversion via Structure- Exploiting and Surrogate-Driven Measure Transport , Journal of Machine Learning Re- search, 27 (2026), pp. 1–71, http://jmlr.org/papers/v27/25-0858.html (accessed 2026-04- 08)

work page 2026
[10]

S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White , MCMC Meth- ods for Functions: Modifying Old Algorithms to Make Them Faster , Sta- tistical Science, 28 (2013), pp. 424–446, https://doi.org/10.1214/13-STS421, https://projecteuclid.org/journals/statistical-science/volume-28/issue-3/ MCMC-Methods-for-Functions--Modifying-Old-Algorithms-to-Make/10.12...

work page doi:10.1214/13-sts421 2013
[11]

Dashti and A

M. Dashti and A. M. Stuart , The Bayesian Approach to Inverse Problems , in Hand- book of Uncertainty Quantification, Springer, Cham, 2017, pp. 311–428, https://doi.org/ 10.1007/978-3-319-12385-1 7, https://link.springer.com/rwe/10.1007/978-3-319-12385-1 7 (accessed 2025-07-23)

work page doi:10.1007/978-3-319-12385-1 2017
[12]

X. Huan, J. Jagalur, and Y. Marzouk , Optimal experimental design: For- mulations and computations , Acta Numerica, 33 (2024), pp. 715–840, https: //doi.org/10.1017/S0962492924000023, https://www.cambridge.org/core/journals/ acta-numerica/article/optimal-experimental-design-formulations-and-computations/ AMORTIZED ENERGY-BASED BAYESIAN INFERENCE 25 38BBD0...

work page doi:10.1017/s0962492924000023 2024
[13]

Karumuri and I

S. Karumuri and I. Bilionis , Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides , Journal of Computational Physics, 511 (2024), p. 113117, https://doi.org/10.1016/j.jcp.2024.113117, http://arxiv. org/abs/2305.20004 (accessed 2026-05-03). arXiv:2305.20004 [stat]

work page doi:10.1016/j.jcp.2024.113117 2024
[14]

Kaveh, J

H. Kaveh, J. P. A vouac, and A. M. Stuart , Data assimilation in machine-learned reduced-order model of chaotic earthquake sequences , Geophysical Journal International, 244 (2026), p. ggaf518, https://doi.org/10.1093/gji/ggaf518, https://doi.org/10.1093/gji/ ggaf518 (accessed 2026-04-08)

work page doi:10.1093/gji/ggaf518 2026
[15]

Kaveh, P

H. Kaveh, P. Batlle, M. Acosta, P. Kulkarni, S. J. Bourne, and J. P. A vouac , Induced Seismicity Forecasting with Uncertainty Quantification: Application to the Groningen Gas Field, Seismological Research Letters, 95 (2023), pp. 773–790, https://doi.org/10.1785/ 0220230179, https://doi.org/10.1785/0220230179 (accessed 2025-04-04)

work page doi:10.1785/0220230179 2023
[16]

Kaveh, O

H. Kaveh, O. Dunbar, J.-P. A vouac, and A. M. Stuart , Bayesian Calibration of dynamic models of earthquake sequences using observations from past large earthquakes , (2026), https://eartharxiv.org/repository/view/12419/ (accessed 2026-04-08)

work page 2026
[17]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar , Fourier Neural Operator for Parametric Partial Differential Equations , May 2021, https://doi.org/10.48550/arXiv.2010.08895, http://arxiv.org/abs/2010.08895 (accessed 2024-03-12). arXiv:2010.08895 [cs, math]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.08895 2021
[18]

An introduction to sampling via measure transport

Y. Marzouk, T. Moselhy, M. Parno, and A. Spantini , An introduction to sampling via measure transport, 2016, pp. 1–41, https://doi.org/10.1007/978-3-319-11259-6 23-1, http: //arxiv.org/abs/1602.05023 (accessed 2026-04-27). arXiv:1602.05023 [stat]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/978-3-319-11259-6 2016
[19]

T. A. E. Moselhy and Y. M. Marzouk , Bayesian Inference with Optimal Maps , Journal of Computational Physics, 231 (2012), pp. 7815–7850, https://doi.org/10.1016/j.jcp.2012.07. 022, http://arxiv.org/abs/1109.1516 (accessed 2026-04-27). arXiv:1109.1516 [stat]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.jcp.2012.07 2012
[20]

Mousavi and J

H. Mousavi and J. D. Eldredge , Bayesian Inference for Estimating Heat Sources Through Temperature Assimilation, ASME Journal of Heat and Mass Transfer, 147 (2024), https: //doi.org/10.1115/1.4066749, https://doi.org/10.1115/1.4066749 (accessed 2026-04-08)

work page doi:10.1115/1.4066749 2024
[21]

Papamakarios , Neural density estimation and likelihood-free inference , arXiv preprint arXiv:1910.13233, (2019)

G. Papamakarios , Neural density estimation and likelihood-free inference , arXiv preprint arXiv:1910.13233, (2019)

work page arXiv 1910
[22]

Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows

G. Papamakarios, D. C. Sterratt, and I. Murray , Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , Jan. 2019, https://doi. org/10.48550/arXiv.1805.07226, http://arxiv.org/abs/1805.07226 (accessed 2026-04-28). arXiv:1805.07226 [stat]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1805.07226 2019
[23]

S. T. Radev, U. K. Mertens, A. Voss, L. Ardizzone, and U. K ¨othe, BayesFlow: Learning complex stochastic models with invertible neural networks , Mar. 2020, https://arxiv.org/ abs/2003.06281v4 (accessed 2026-04-28)

work page arXiv 2020
[24]

A. M. Stuart , Inverse problems: A Bayesian perspective , Acta Numerica, 19 (2010), pp. 451–559, https://doi.org/10.1017/S0962492910000061, https://www.cambridge. org/core/journals/acta-numerica/article/abs/inverse-problems-a-bayesian-perspective/ 587A3A0D480A1A7C2B1B284BCEDF7E23 (accessed 2026-04-28)

work page doi:10.1017/s0962492910000061 2010
[25]

Taghvaei and B

A. Taghvaei and B. Hosseini , An optimal transport formulation of Bayes’ law for nonlinear filtering algorithms, in 2022 IEEE 61st Conference on Decision and Control (CDC), IEEE, 2022, pp. 6608–6613

work page 2022
[26]

Inverse Problem Theory and Methods for Model Parameter Estimation

A. Tarantola , Inverse Problem Theory and Methods for Model Parameter Estimation , Other Titles in Applied Mathematics, Society for Industrial and Applied Mathematics, Jan. 2005, https://doi.org/10.1137/1.9780898717921, https://epubs.siam.org/doi/book/ 10.1137/1.9780898717921 (accessed 2026-04-27)

work page doi:10.1137/1.9780898717921 2005
[27]

, year =

C. Villani , Optimal Transport, vol. 338 of Grundlehren der mathematischen Wissenschaften, Springer, Berlin, Heidelberg, 2009, https://doi.org/10.1007/978-3-540-71050-9, http:// link.springer.com/10.1007/978-3-540-71050-9 (accessed 2026-04-27)

work page doi:10.1007/978-3-540-71050-9 2009
[28]

Wildberger, M

J. Wildberger, M. Dax, S. Buchholz, S. Green, J. H. Macke, and B. Sch ¨olkopf, Flow matching for scalable simulation-based inference, Advances in Neural Information Process- ing Systems, 36 (2023), pp. 16837–16864

work page 2023

[1] [1]

Arridge, P

S. Arridge, P. Maass, O. ¨Oktem, and C.-B. Sch ¨onlieb, Solving inverse problems using data-driven models , Acta Numerica, 28 (2019), pp. 1–174, https://doi.org/10.1017/S0962492919000059, https://www.cambridge.org/core/ journals/acta-numerica/article/solving-inverse-problems-using-datadriven-models/ CE5B3725869AEAF46E04874115B0AB15?utm source=chatgpt.com ...

work page doi:10.1017/s0962492919000059 2019

[2] [2]

E. Bach, R. Baptista, D. Sanz-Alonso, and A. Stuart , Machine Learning for Inverse Problems and Data Assimilation , Oct. 2025, https://doi.org/10.48550/arXiv.2410.10523, http://arxiv.org/abs/2410.10523 (accessed 2025-11-14). arXiv:2410.10523 [stat]

work page doi:10.48550/arxiv.2410.10523 2025

[3] [3]

Baptista, B

R. Baptista, B. Hosseini, N. B. Kovachki, and Y. M. Marzouk , Conditional sampling with monotone GANs: From generative models to likelihood-free inference, SIAM/ASA Journal on Uncertainty Quantification, 12 (2024), pp. 868–900

work page 2024

[4] [4]

Baptista, Y

R. Baptista, Y. Marzouk, and O. Zahm , On the representation and learning of mono- tone triangular transport maps , Foundations of Computational Mathematics, 24 (2024), pp. 2063–2108

work page 2024

[5] [5]

Baptista, A.-A

R. Baptista, A.-A. Pooladian, M. Brennan, Y. Marzouk, and J. Niles-Weed , Condi- tional simulation via entropic optimal transport: Toward non-parametric estimation of conditional Brenier maps, in International Conference on Artificial Intelligence and Statis- tics, PMLR, 2025, pp. 4807–4815

work page 2025

[6] [6]

Bogachev , Gaussian Measures, vol

V. Bogachev , Gaussian Measures, vol. 62 of Mathematical Surveys and Monographs, Ameri- can Mathematical Society, Providence, Rhode Island, Sept. 1998, https://doi.org/10.1090/ surv/062, https://www.ams.org/surv/062 (accessed 2026-04-28)

work page 1998

[7] [7]

V. I. Bogachev, A. V. Kolesnikov, and K. V. Medvedev , Triangular trans- formations of measures , Sbornik: Mathematics, 196 (2005), p. 309, https: //doi.org/10.1070/SM2005v196n03ABEH000882, https://iopscience.iop.org/article/ 10.1070/SM2005v196n03ABEH000882/meta (accessed 2025-09-03)

work page doi:10.1070/sm2005v196n03abeh000882 2005

[8] [8]

Brooks, A

S. Brooks, A. Gelman, G. Jones, and X.-L. Meng , Handbook of Markov Chain Monte Carlo, CRC press, 2011

work page 2011

[9] [9]

L. Cao, J. Chen, M. Brennan, T. O’Leary-Roseberry, Y. Marzouk, and O. Ghattas , LazyDINO: Fast, Scalable, and Efficiently Amortized Bayesian Inversion via Structure- Exploiting and Surrogate-Driven Measure Transport , Journal of Machine Learning Re- search, 27 (2026), pp. 1–71, http://jmlr.org/papers/v27/25-0858.html (accessed 2026-04- 08)

work page 2026

[10] [10]

S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White , MCMC Meth- ods for Functions: Modifying Old Algorithms to Make Them Faster , Sta- tistical Science, 28 (2013), pp. 424–446, https://doi.org/10.1214/13-STS421, https://projecteuclid.org/journals/statistical-science/volume-28/issue-3/ MCMC-Methods-for-Functions--Modifying-Old-Algorithms-to-Make/10.12...

work page doi:10.1214/13-sts421 2013

[11] [11]

Dashti and A

M. Dashti and A. M. Stuart , The Bayesian Approach to Inverse Problems , in Hand- book of Uncertainty Quantification, Springer, Cham, 2017, pp. 311–428, https://doi.org/ 10.1007/978-3-319-12385-1 7, https://link.springer.com/rwe/10.1007/978-3-319-12385-1 7 (accessed 2025-07-23)

work page doi:10.1007/978-3-319-12385-1 2017

[12] [12]

X. Huan, J. Jagalur, and Y. Marzouk , Optimal experimental design: For- mulations and computations , Acta Numerica, 33 (2024), pp. 715–840, https: //doi.org/10.1017/S0962492924000023, https://www.cambridge.org/core/journals/ acta-numerica/article/optimal-experimental-design-formulations-and-computations/ AMORTIZED ENERGY-BASED BAYESIAN INFERENCE 25 38BBD0...

work page doi:10.1017/s0962492924000023 2024

[13] [13]

Karumuri and I

S. Karumuri and I. Bilionis , Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides , Journal of Computational Physics, 511 (2024), p. 113117, https://doi.org/10.1016/j.jcp.2024.113117, http://arxiv. org/abs/2305.20004 (accessed 2026-05-03). arXiv:2305.20004 [stat]

work page doi:10.1016/j.jcp.2024.113117 2024

[14] [14]

Kaveh, J

H. Kaveh, J. P. A vouac, and A. M. Stuart , Data assimilation in machine-learned reduced-order model of chaotic earthquake sequences , Geophysical Journal International, 244 (2026), p. ggaf518, https://doi.org/10.1093/gji/ggaf518, https://doi.org/10.1093/gji/ ggaf518 (accessed 2026-04-08)

work page doi:10.1093/gji/ggaf518 2026

[15] [15]

Kaveh, P

H. Kaveh, P. Batlle, M. Acosta, P. Kulkarni, S. J. Bourne, and J. P. A vouac , Induced Seismicity Forecasting with Uncertainty Quantification: Application to the Groningen Gas Field, Seismological Research Letters, 95 (2023), pp. 773–790, https://doi.org/10.1785/ 0220230179, https://doi.org/10.1785/0220230179 (accessed 2025-04-04)

work page doi:10.1785/0220230179 2023

[16] [16]

Kaveh, O

H. Kaveh, O. Dunbar, J.-P. A vouac, and A. M. Stuart , Bayesian Calibration of dynamic models of earthquake sequences using observations from past large earthquakes , (2026), https://eartharxiv.org/repository/view/12419/ (accessed 2026-04-08)

work page 2026

[17] [17]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar , Fourier Neural Operator for Parametric Partial Differential Equations , May 2021, https://doi.org/10.48550/arXiv.2010.08895, http://arxiv.org/abs/2010.08895 (accessed 2024-03-12). arXiv:2010.08895 [cs, math]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.08895 2021

[18] [18]

An introduction to sampling via measure transport

Y. Marzouk, T. Moselhy, M. Parno, and A. Spantini , An introduction to sampling via measure transport, 2016, pp. 1–41, https://doi.org/10.1007/978-3-319-11259-6 23-1, http: //arxiv.org/abs/1602.05023 (accessed 2026-04-27). arXiv:1602.05023 [stat]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/978-3-319-11259-6 2016

[19] [19]

T. A. E. Moselhy and Y. M. Marzouk , Bayesian Inference with Optimal Maps , Journal of Computational Physics, 231 (2012), pp. 7815–7850, https://doi.org/10.1016/j.jcp.2012.07. 022, http://arxiv.org/abs/1109.1516 (accessed 2026-04-27). arXiv:1109.1516 [stat]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.jcp.2012.07 2012

[20] [20]

Mousavi and J

H. Mousavi and J. D. Eldredge , Bayesian Inference for Estimating Heat Sources Through Temperature Assimilation, ASME Journal of Heat and Mass Transfer, 147 (2024), https: //doi.org/10.1115/1.4066749, https://doi.org/10.1115/1.4066749 (accessed 2026-04-08)

work page doi:10.1115/1.4066749 2024

[21] [21]

Papamakarios , Neural density estimation and likelihood-free inference , arXiv preprint arXiv:1910.13233, (2019)

G. Papamakarios , Neural density estimation and likelihood-free inference , arXiv preprint arXiv:1910.13233, (2019)

work page arXiv 1910

[22] [22]

Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows

G. Papamakarios, D. C. Sterratt, and I. Murray , Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , Jan. 2019, https://doi. org/10.48550/arXiv.1805.07226, http://arxiv.org/abs/1805.07226 (accessed 2026-04-28). arXiv:1805.07226 [stat]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1805.07226 2019

[23] [23]

S. T. Radev, U. K. Mertens, A. Voss, L. Ardizzone, and U. K ¨othe, BayesFlow: Learning complex stochastic models with invertible neural networks , Mar. 2020, https://arxiv.org/ abs/2003.06281v4 (accessed 2026-04-28)

work page arXiv 2020

[24] [24]

A. M. Stuart , Inverse problems: A Bayesian perspective , Acta Numerica, 19 (2010), pp. 451–559, https://doi.org/10.1017/S0962492910000061, https://www.cambridge. org/core/journals/acta-numerica/article/abs/inverse-problems-a-bayesian-perspective/ 587A3A0D480A1A7C2B1B284BCEDF7E23 (accessed 2026-04-28)

work page doi:10.1017/s0962492910000061 2010

[25] [25]

Taghvaei and B

A. Taghvaei and B. Hosseini , An optimal transport formulation of Bayes’ law for nonlinear filtering algorithms, in 2022 IEEE 61st Conference on Decision and Control (CDC), IEEE, 2022, pp. 6608–6613

work page 2022

[26] [26]

Inverse Problem Theory and Methods for Model Parameter Estimation

A. Tarantola , Inverse Problem Theory and Methods for Model Parameter Estimation , Other Titles in Applied Mathematics, Society for Industrial and Applied Mathematics, Jan. 2005, https://doi.org/10.1137/1.9780898717921, https://epubs.siam.org/doi/book/ 10.1137/1.9780898717921 (accessed 2026-04-27)

work page doi:10.1137/1.9780898717921 2005

[27] [27]

, year =

C. Villani , Optimal Transport, vol. 338 of Grundlehren der mathematischen Wissenschaften, Springer, Berlin, Heidelberg, 2009, https://doi.org/10.1007/978-3-540-71050-9, http:// link.springer.com/10.1007/978-3-540-71050-9 (accessed 2026-04-27)

work page doi:10.1007/978-3-540-71050-9 2009

[28] [28]

Wildberger, M

J. Wildberger, M. Dax, S. Buchholz, S. Green, J. H. Macke, and B. Sch ¨olkopf, Flow matching for scalable simulation-based inference, Advances in Neural Information Process- ing Systems, 36 (2023), pp. 16837–16864

work page 2023