Particle-based Energetic Variational Inference

Chun Liu; Jiuhai Chen; Lulu Kang; Yiwei Wang

arxiv: 2004.06443 · v4 · submitted 2020-04-14 · 📊 stat.ML · cs.LG

Particle-based Energetic Variational Inference

Yiwei Wang , Jiuhai Chen , Chun Liu , Lulu Kang This is my paper

Pith reviewed 2026-05-24 15:35 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords variational inferenceparticle-based variational inferenceenergetic variational inferenceSVGDKL-divergenceapproximation-then-variation

0 comments

The pith

Energetic variational inference derives existing particle methods including SVGD and introduces an approximation-then-variation scheme that reduces KL-divergence each step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces energetic variational inference as a framework that minimizes the variational inference objective using a prescribed energy-dissipation law. This framework derives many existing particle-based variational inference methods, including SVGD, and supports creation of new schemes. The highlighted new method approximates the density with particles first and then performs the variational update, preserving variational structure at the particle level. Experiments indicate this ordering produces larger KL-divergence reductions per iteration and better fidelity to the target distribution than some prior particle methods.

Core claim

The energetic variational inference framework, based on a prescribed energy-dissipation law, derives many particle-based variational inference methods including SVGD; a new approximation-then-variation scheme performs particle-based density approximation first then the variational procedure, maintains the variational structure at the particle level, and significantly decreases the KL-divergence in each iteration.

What carries the argument

Energetic variational inference (EVI) framework that minimizes the VI objective based on an energy-dissipation law, together with the approximation-then-variation ordering for particle schemes.

Load-bearing premise

Performing the particle-based density approximation first and the variational update second preserves the variational structure at the particle level.

What would settle it

An experiment or calculation showing that the new approximation-then-variation scheme fails to decrease KL-divergence more than existing particle methods or fails to improve fidelity to the target distribution.

read the original abstract

We introduce a new variational inference (VI) framework, called energetic variational inference (EVI). It minimizes the VI objective function based on a prescribed energy-dissipation law. Using the EVI framework, we can derive many existing Particle-based Variational Inference (ParVI) methods, including the popular Stein Variational Gradient Descent (SVGD) approach. More importantly, many new ParVI schemes can be created under this framework. For illustration, we propose a new particle-based EVI scheme, which performs the particle-based approximation of the density first and then uses the approximated density in the variational procedure, or "Approximation-then-Variation" for short. Thanks to this order of approximation and variation, the new scheme can maintain the variational structure at the particle level, and can significantly decrease the KL-divergence in each iteration. Numerical experiments show the proposed method outperforms some existing ParVI methods in terms of fidelity to the target distribution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EVI gives a clean energy-dissipation route to recover SVGD and similar ParVI methods, plus one new approximation-first scheme whose KL decrease needs explicit discrete verification.

read the letter

The main takeaway is that this paper supplies an energy-dissipation principle that recovers several existing particle variational inference algorithms, including SVGD, and then uses that principle to motivate a new ordering: approximate the density with particles first, then perform the variational step on the approximation. That ordering is presented as the source of the claimed per-iteration KL reduction and the improved empirical fidelity. The framework itself is the clearest addition; it starts from a continuous law rather than from a direct KL objective and shows how multiple schemes fall out of the same dissipation structure. The experiments are straightforward and show the new scheme beating a few baselines on standard targets, which is useful evidence even if the margins are not dramatic. The soft spot is exactly the one raised in the stress test. Inserting the particle approximation before the variation step can change the effective energy functional seen by the particles. The paper needs to demonstrate, via the weak form or the resulting velocity field, that the discrete update still corresponds to a gradient flow of the original energy (or a controllable perturbation of it) so that the KL decrease is inherited rather than assumed. If that step is only justified in the continuous limit, the central algorithmic claim rests on an unverified passage to the discrete case. Minor issues include the usual lack of ablation on kernel choice or particle count, but those are secondary. This work is aimed at researchers already working on ParVI or Wasserstein-based inference who want a systematic way to generate new schemes. A reader in that group will find the derivations and the ordering idea worth examining. It is coherent enough on its own terms to merit a serious referee, provided the discrete consistency argument is tightened.

Referee Report

3 major / 2 minor

Summary. The paper introduces Energetic Variational Inference (EVI), a framework that derives particle-based variational inference (ParVI) methods, including SVGD, from a prescribed energy-dissipation law. It proposes a new 'Approximation-then-Variation' scheme that first approximates the density via particles and then applies the variational update, claiming this order preserves the variational structure at the particle level, yields a strict decrease in KL divergence per iteration, and outperforms existing ParVI methods in numerical experiments.

Significance. If the consistency between the discrete particle scheme and the continuous energy-dissipation law holds, the work supplies a unifying derivation for existing ParVI algorithms and a new scheme whose per-iteration KL decrease is inherited from the underlying variational structure rather than imposed ad hoc. This would strengthen the theoretical grounding of particle-based inference and enable systematic construction of new methods with controllable dissipation properties.

major comments (3)

[Abstract / derivation of new scheme] The central claim that the Approximation-then-Variation scheme 'maintains the variational structure at the particle level' and 'can significantly decrease the KL-divergence in each iteration' (abstract) requires an explicit verification that the particle approximation of the density, when inserted before the variation step, produces a velocity field that remains the Wasserstein gradient of the same energy functional. The manuscript must supply the Euler-Lagrange equation or weak-form derivation showing that the approximated dissipation functional is consistent with the continuous EVI law up to controllable error; without this, the asserted KL decrease does not follow from the framework.
[Section deriving SVGD and other ParVI methods] The derivation that existing ParVI methods (including SVGD) arise from the EVI energy-dissipation law must be checked for parameter-free status. If the particle approximation step introduces kernel bandwidths or other tuning parameters that are fitted rather than prescribed by the dissipation law, the claim that EVI supplies a parameter-free unification is undermined.
[Numerical experiments section] Numerical experiments are cited as showing outperformance, yet the abstract supplies no error bars, convergence plots of KL divergence, or comparison against the continuous-time limit of the scheme. The manuscript should report the measured per-iteration KL decrease and confirm it is not an artifact of the chosen particle count or kernel.

minor comments (2)

Notation for the energy functional and dissipation potential should be introduced once and used consistently; the transition from continuous density to empirical measure needs an explicit symbol.
[Abstract] The abstract states the new scheme 'outperforms some existing ParVI methods' without naming the baselines or reporting quantitative metrics; this should be clarified in the abstract or moved to the results section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important points for strengthening the theoretical justification and experimental presentation. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract / derivation of new scheme] The central claim that the Approximation-then-Variation scheme 'maintains the variational structure at the particle level' and 'can significantly decrease the KL-divergence in each iteration' (abstract) requires an explicit verification that the particle approximation of the density, when inserted before the variation step, produces a velocity field that remains the Wasserstein gradient of the same energy functional. The manuscript must supply the Euler-Lagrange equation or weak-form derivation showing that the approximated dissipation functional is consistent with the continuous EVI law up to controllable error; without this, the asserted KL decrease does not follow from the framework.

Authors: We agree that an explicit weak-form derivation is required to rigorously connect the particle scheme to the continuous energy-dissipation law. In the revised manuscript we will add the Euler-Lagrange derivation for the approximated dissipation functional, showing that the resulting velocity field is the Wasserstein gradient of the energy (up to a discretization error controlled by particle number and kernel width). This will directly establish the per-iteration KL decrease from the variational structure. revision: yes
Referee: [Section deriving SVGD and other ParVI methods] The derivation that existing ParVI methods (including SVGD) arise from the EVI energy-dissipation law must be checked for parameter-free status. If the particle approximation step introduces kernel bandwidths or other tuning parameters that are fitted rather than prescribed by the dissipation law, the claim that EVI supplies a parameter-free unification is undermined.

Authors: The EVI framework itself prescribes the form of the update directly from the energy-dissipation law without introducing extra parameters. Kernel bandwidths and similar quantities belong to the choice of particle approximation (as they do in the original SVGD derivation) and are not fitted by the EVI procedure. The unification claim concerns the variational origin of the dynamics, which remains parameter-free at the continuous level. We will insert a clarifying paragraph distinguishing the law from the approximation choices. revision: partial
Referee: [Numerical experiments section] Numerical experiments are cited as showing outperformance, yet the abstract supplies no error bars, convergence plots of KL divergence, or comparison against the continuous-time limit of the scheme. The manuscript should report the measured per-iteration KL decrease and confirm it is not an artifact of the chosen particle count or kernel.

Authors: We will expand the numerical section to include error bars from multiple independent runs, per-iteration KL-divergence trajectories, and a brief comparison with the continuous-time limit obtained by increasing particle count. These additions will demonstrate that the observed KL decrease is consistent with the theory and not an artifact of specific discretization parameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in external energy-dissipation law

full rationale

The paper introduces the EVI framework from a prescribed energy-dissipation law (external to any fitted quantities inside the manuscript) and shows that existing ParVI methods including SVGD can be recovered as special cases. The new approximation-then-variation scheme is explicitly defined by the ordering of operations; its claimed preservation of variational structure and KL decrease are asserted as consequences of that ordering and are supported by numerical experiments rather than by re-labeling a fit as a prediction. No load-bearing self-citation chain, uniqueness theorem imported from the same authors, or ansatz smuggled via prior work appears in the abstract or described derivation. The central claims therefore remain independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that variational objectives can be minimized via a prescribed energy-dissipation law; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption The variational inference objective can be minimized based on a prescribed energy-dissipation law.
This is the core premise of the EVI framework stated in the abstract.

pith-pipeline@v0.9.0 · 5688 in / 1195 out tokens · 29635 ms · 2026-05-24T15:35:40.864830+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 5 internal anchors

[1]

Journal of the Royal Statistical Society: Series B 28(1), 131–142 (1966)

Ali, S.M., Silvey, S.D.: A general class of coeﬃcients of di- vergence of one distribution from another. Journal of the Royal Statistical Society: Series B 28(1), 131–142 (1966)

work page 1966
[2]

Manuscripta Mathematica 121(1), 1–50 (2006)

Ambrosio, L., Lisini, S., Savar´ e, G.: Stability of ﬂows as- sociated to gradient vector ﬁelds and convergence of iter- ated transport maps. Manuscripta Mathematica 121(1), 1–50 (2006)

work page 2006
[3]

In: Advances in Neural Information Processing Systems, pp

Arbel, M., Korba, A., Salim, A., Gretton, A.: Maximum mean discrepancy gradient ﬂow. In: Advances in Neural Information Processing Systems, pp. 6484–6494 (2019)

work page 2019
[4]

Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

work page 1988
[5]

Springer, New York (2006)

Bishop, C.M.: Pattern recognition and machine learning. Springer, New York (2006)

work page 2006
[6]

Blei, D.M., Kucukelbir, A., McAuliﬀe, J.D.: Variational inference: A review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

work page 2017
[7]

Carrillo, J.A., Craig, K., Patacchini, F.S.: A blob method for diﬀusion. Calc. Var. Partial. Diﬀer. Equ. 58(2), 53 (2019)

work page 2019
[8]

Carrillo, J.A., D¨ uring, B., Matthes, D., McCormick, D.S.: A Lagrangian scheme for the solution of nonlinear dif- fusion equations using moving simplex meshes. J. Sci. Comput. 75(3), 1463–1499 (2018)

work page 2018
[9]

Nonlinear partial diﬀerential equations and hyperbolic wave phe- nomena 526, 37–51 (2010)

Carrillo, J.A., Lisini, S.: On the asymptotic behavior of the gradient ﬂow of a polyconvex functional. Nonlinear partial diﬀerential equations and hyperbolic wave phe- nomena 526, 37–51 (2010)

work page 2010
[10]

The American Statistician 46(3), 167–174 (1992)

Casella, G., George, E.I.: Explaining the Gibbs sampler. The American Statistician 46(3), 167–174 (1992)

work page 1992
[11]

A Unified Particle-Optimization Framework for Scalable Bayesian Sampling

Chen, C., Zhang, R., Wang, W., Li, B., Chen, L.: A uniﬁed particle-optimization framework for scalable Bayesian sampling. arXiv preprint arXiv:1805.11659 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

arXiv preprint arXiv:1901.08659 (2019)

Chen, P., Wu, K., Chen, J., O’Leary-Roseberry, T., Ghat- tas, O.: Projected stein variational newton: A fast and scalable Bayesian inference method in high dimensions. arXiv preprint arXiv:1901.08659 (2019)

work page arXiv 1901
[13]

In: Artiﬁcial Intelligence and Statistics, pp

Dai, B., He, N., Dai, H., Song, L.: Provable bayesian infer- ence via particle mirror descent. In: Artiﬁcial Intelligence and Statistics, pp. 985–994 (2016)

work page 2016
[14]

Degond, P., Mustieles, F.J.: A deterministic approxima- tion of diﬀusion equations using particles. SIAM J. Sci. Comput. 11(2), 293–310 (1990)

work page 1990
[15]

In: Ad- vances in Neural Information Processing Systems, pp

Detommaso, G., Cui, T., Marzouk, Y., Spantini, A., Sche- ichl, R.: A Stein variational Newton method. In: Ad- vances in Neural Information Processing Systems, pp. 9169–9179 (2018)

work page 2018
[16]

The Phase Field Method for Geometric Moving Interfaces and Their Numerical Approximations

Du, Q., Feng, X.: The phase ﬁeld method for geometric moving interfaces and their numerical approximations. arXiv preprint arXiv:1902.04924 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1902
[17]

Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)

work page 1987
[18]

Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

work page 2011
[19]

El Moselhy, T.A., Marzouk, Y.M.: Bayesian inference with optimal maps. J. Comput. Phys. 231(23), 7815– 7850 (2012)

work page 2012
[20]

Evans, L.C., Savin, O., Gangbo, W.: Diﬀeomorphisms and nonlinear heat ﬂows. SIAM J. Math. Anal. 37(3), 737–751 (2005)

work page 2005
[21]

In: Inter- national Symposium on Applied Stochastic Models and Data Analysis, pp

Francois, D., Wertz, V., Verleysen, M., et al.: About the locality of kernels in high-dimensional spaces. In: Inter- national Symposium on Applied Stochastic Models and Data Analysis, pp. 238–245. Citeseer (2005)

work page 2005
[22]

Approximate inference with Wasserstein gradient flows

Frogner, C., Poggio, T.: Approximate inference with Wasserstein gradient ﬂows. arXiv preprint arXiv:1806.04542 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Chapman and Hall/CRC (2013)

Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Ve- htari, A., Rubin, D.B.: Bayesian data analysis. Chapman and Hall/CRC (2013)

work page 2013
[24]

IEEE Trans

Geman, S., Geman, D.: Stochastic relaxation, Gibbs dis- tributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. P AMI-6(6), 721–741 (1984)

work page 1984
[25]

In: Proceedings of the 29th International Coference on International Conference on Machine Learning, pp

Gershman, S.J., Hoﬀman, M.D., Blei, D.M.: Nonpara- metric variational inference. In: Proceedings of the 29th International Coference on International Conference on Machine Learning, pp. 235–242 (2012)

work page 2012
[26]

Handbook of Mathematical Analysis in Mechanics of Viscous Fluids pp

Giga, M.H., Kirshtein, A., Liu, C.: Variational modeling and complex ﬂuids. Handbook of Mathematical Analysis in Mechanics of Viscous Fluids pp. 1–41 (2017)

work page 2017
[27]

Cambridge University Press (2008)

Gonzalez, O., Stuart, A.M.: A ﬁrst course in continuum mechanics. Cambridge University Press (2008)

work page 2008
[28]

Computational Statistics 14(3), 375–396 (1999)

Haario, H., Saksman, E., Tamminen, J.: Adaptive pro- posal distribution for random walk Metropolis algorithm. Computational Statistics 14(3), 375–396 (1999)

work page 1999
[29]

Biometrika 57(1), 97–109 (1970)

Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)

work page 1970
[30]

Hohenberg, P.C., Halperin, B.I.: Theory of dynamic crit- ical phenomena. Rev. Mod. Phys. 49(3), 435 (1977)

work page 1977
[31]

Iserles, A.: A ﬁrst course in the numerical analysis of diﬀerential equations. No. 44 in Cambridge Texts in Applied Mathematics. Cambridge university press, New York (2009)

work page 2009
[32]

Machine Learning 37(2), 183–233 (1999)

Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Machine Learning 37(2), 183–233 (1999)

work page 1999
[33]

Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)

work page 1998
[34]

In: Advances in Neural Information Processing Systems, pp

Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive ﬂow. In: Advances in Neural Information Processing Systems, pp. 4743–4751 (2016)

work page 2016
[35]

In: ESAIM: Proceedings, vol

Lacombe, G., Mas-Gallic, S.: Presentation and analysis of a diﬀusion-velocity method. In: ESAIM: Proceedings, vol. 7, pp. 225–233. EDP Sciences (1999)

work page 1999
[36]

arXiv preprint arXiv:1902.03394 (2019)

Li, L., Liu, J.G., Liu, Z., Lu, J.: A stochastic version of Stein variational gradient descent for eﬃcient sampling. arXiv preprint arXiv:1902.03394 (2019)

work page arXiv 1902
[37]

In: Multi-Scale Phenomena in Complex Fluids: Modeling, Analysis and Numerical Simulation, pp

Liu, C.: An introduction of elastic complex ﬂuids: an en- ergetic variational approach. In: Multi-Scale Phenomena in Complex Fluids: Modeling, Analysis and Numerical Simulation, pp. 286–337. World Scientiﬁc (2009)

work page 2009
[38]

Journal of Computational Physics p

Liu, C., Wang, Y.: On Lagrangian schemes for porous medium type generalized diﬀusion equations: a discrete energetic variational approach. Journal of Computational Physics p. 109566 (2020)

work page 2020
[39]

arXiv preprint arXiv:2003.10413 (2020)

Liu, C., Wang, Y.: A variational Lagrangian scheme for a phase ﬁeld model: A discrete energetic variational ap- proach. arXiv preprint arXiv:2003.10413 (2020)

work page arXiv 2003
[40]

In: Thirty-Second AAAI Conference on Artiﬁcial Intelligence (2018) Particle-based Energetic Variational Inference 17

Liu, C., Zhu, J.: Riemannian Stein variational gradient descent for Bayesian inference. In: Thirty-Second AAAI Conference on Artiﬁcial Intelligence (2018) Particle-based Energetic Variational Inference 17

work page 2018
[41]

In: International Conference on Machine Learning, pp

Liu, C., Zhuo, J., Cheng, P., Zhang, R., Zhu, J.: Under- standing and accelerating particle-based variational infer- ence. In: International Conference on Machine Learning, pp. 4082–4092 (2019)

work page 2019
[42]

In: Advances in Neural Information Processing Sys- tems, pp

Liu, Q.: Stein variational gradient descent as gradient ﬂow. In: Advances in Neural Information Processing Sys- tems, pp. 3115–3123 (2017)

work page 2017
[43]

In: Ad- vances in Neural Information Processing Systems, pp

Liu, Q., Wang, D.: Stein variational gradient descent: A general purpose Bayesian inference algorithm. In: Ad- vances in Neural Information Processing Systems, pp. 2378–2386 (2016)

work page 2016
[44]

Lu, J., Lu, Y., Nolen, J.: Scaling limit of the Stein varia- tional gradient descent: The mean ﬁeld regime. SIAM J. Math. Anal. 51(2), 648–671 (2019)

work page 2019
[45]

Cambridge university press (2003)

MacKay, D.J., Mac Kay, D.J.: Information theory, infer- ence and learning algorithms. Cambridge university press (2003)

work page 2003
[46]

ESAIM: Mathematical Modelling and Numerical Analysis 53(1), 145–172 (2019)

Matthes, D., Plazotta, S.: A variational formulation of the BDF2 method for metric gradient ﬂows. ESAIM: Mathematical Modelling and Numerical Analysis 53(1), 145–172 (2019)

work page 2019
[47]

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087– 1092 (1953)

work page 1953
[48]

In: Neu- ral networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop, pp

Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Neu- ral networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop, pp. 41–48. Ieee (1999)

work page 1999
[49]

MIT press (2012)

Murphy, K.P.: Machine learning: a probabilistic perspec- tive. MIT press (2012)

work page 2012
[50]

Department of Computer Science, University of Toronto Toronto, Ontario, Canada (1993)

Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods. Department of Computer Science, University of Toronto Toronto, Ontario, Canada (1993)

work page 1993
[51]

In: Learning in Graphical Models, pp

Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justiﬁes incremental, sparse, and other variants. In: Learning in Graphical Models, pp. 355–368. Springer (1998)

work page 1998
[52]

Onsager, L.: Reciprocal relations in irreversible processes. I. Phys. Rev. 37(4), 405 (1931)

work page 1931
[53]

Onsager, L.: Reciprocal relations in irreversible processes. II. Phys. Rev. 38(12), 2265 (1931)

work page 1931
[54]

arXiv preprint arXiv:1912.02762 (2019)

Papamakarios, G., Nalisnick, E., Rezende, D.J., Mo- hamed, S., Lakshminarayanan, B.: Normalizing ﬂows for probabilistic modeling and inference. arXiv preprint arXiv:1912.02762 (2019)

work page arXiv 1912
[55]

Nuclear Physics B 180(3), 378–384 (1981)

Parisi, G.: Correlation functions and computer simula- tions. Nuclear Physics B 180(3), 378–384 (1981)

work page 1981
[56]

Proceedings of the London Mathematical Society 1(1), 119–124 (1873)

Rayleigh, L.: Note on the numerical calculation of the roots of ﬂuctuating functions. Proceedings of the London Mathematical Society 1(1), 119–124 (1873)

work page
[57]

Variational Inference with Normalizing Flows

Rezende, D.J., Mohamed, S.: Variational inference with normalizing ﬂows. arXiv preprint arXiv:1505.05770 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[58]

Bernoulli 2(4), 341–363 (1996)

Roberts, G.O., Tweedie, R.L., et al.: Exponential con- vergence of Langevin distributions and their discrete ap- proximations. Bernoulli 2(4), 341–363 (1996)

work page 1996
[59]

Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

work page 1976
[60]

Rossky, P.J., Doll, J.D., Friedman, H.L.: Brownian dy- namics as smart Monte Carlo simulation. J. Chem. Phys. 69(10), 4628–4633 (1978)

work page 1978
[61]

In: International Conference on Machine Learning, pp

Salimans, T., Kingma, D., Welling, M.: Markov chain Monte Carlo and variational inference: Bridging the gap. In: International Conference on Machine Learning, pp. 1218–1226 (2015)

work page 2015
[62]

Deep Diffeomorphic Normalizing Flows

Salman, H., Yadollahpour, P., Fletcher, T., Batmanghe- lich, K.: Deep diﬀeomorphic normalizing ﬂows. arXiv preprint arXiv:1810.03256 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[63]

Santambrogio, F.: {Euclidean, metric, and Wasserstein} gradient ﬂows: an overview. Bull. Math. Sci 7(1), 87–154 (2017)

work page 2017
[64]

The Journal of Machine Learning Research 20(1), 31–82 (2019)

Sonoda, S., Murata, N.: Transport analysis of inﬁnitely deep neural network. The Journal of Machine Learning Research 20(1), 31–82 (2019)

work page 2019
[65]

Acta Numer

Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010)

work page 2010
[66]

Tabak, E.G., Vanden-Eijnden, E., et al.: Density estima- tion by dual ascent of the log-likelihood. Commun. Math. Sci. 8(1), 217–233 (2010)

work page 2010
[67]

Cambridge University Press (2005)

Temam, R., Miranville, A.: Mathematical modeling in continuum mechanics. Cambridge University Press (2005)

work page 2005
[68]

Villani, C.: Optimal transport: old and new, vol. 338. Springer Science & Business Media (2008)

work page 2008
[69]

Founda- tions and Trends ® in Machine Learning 1(1–2), 1–305 (2008)

Wainwright, M.J., Jordan, M.I., et al.: Graphical models, exponential families, and variational inference. Founda- tions and Trends ® in Machine Learning 1(1–2), 1–305 (2008)

work page 2008
[70]

In: Ad- vances in Neural Information Processing Systems, pp

Wang, D., Tang, Z., Bajaj, C., Liu, Q.: Stein variational gradient descent with matrix-valued kernels. In: Ad- vances in Neural Information Processing Systems, pp. 7834–7844 (2019)

work page 2019
[71]

In: Proceedings of the 28th International Conference on Machine Learning, pp

Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning, pp. 681– 688 (2011)

work page 2011

[1] [1]

Journal of the Royal Statistical Society: Series B 28(1), 131–142 (1966)

Ali, S.M., Silvey, S.D.: A general class of coeﬃcients of di- vergence of one distribution from another. Journal of the Royal Statistical Society: Series B 28(1), 131–142 (1966)

work page 1966

[2] [2]

Manuscripta Mathematica 121(1), 1–50 (2006)

Ambrosio, L., Lisini, S., Savar´ e, G.: Stability of ﬂows as- sociated to gradient vector ﬁelds and convergence of iter- ated transport maps. Manuscripta Mathematica 121(1), 1–50 (2006)

work page 2006

[3] [3]

In: Advances in Neural Information Processing Systems, pp

Arbel, M., Korba, A., Salim, A., Gretton, A.: Maximum mean discrepancy gradient ﬂow. In: Advances in Neural Information Processing Systems, pp. 6484–6494 (2019)

work page 2019

[4] [4]

Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

work page 1988

[5] [5]

Springer, New York (2006)

Bishop, C.M.: Pattern recognition and machine learning. Springer, New York (2006)

work page 2006

[6] [6]

Blei, D.M., Kucukelbir, A., McAuliﬀe, J.D.: Variational inference: A review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

work page 2017

[7] [7]

Carrillo, J.A., Craig, K., Patacchini, F.S.: A blob method for diﬀusion. Calc. Var. Partial. Diﬀer. Equ. 58(2), 53 (2019)

work page 2019

[8] [8]

Carrillo, J.A., D¨ uring, B., Matthes, D., McCormick, D.S.: A Lagrangian scheme for the solution of nonlinear dif- fusion equations using moving simplex meshes. J. Sci. Comput. 75(3), 1463–1499 (2018)

work page 2018

[9] [9]

Nonlinear partial diﬀerential equations and hyperbolic wave phe- nomena 526, 37–51 (2010)

Carrillo, J.A., Lisini, S.: On the asymptotic behavior of the gradient ﬂow of a polyconvex functional. Nonlinear partial diﬀerential equations and hyperbolic wave phe- nomena 526, 37–51 (2010)

work page 2010

[10] [10]

The American Statistician 46(3), 167–174 (1992)

Casella, G., George, E.I.: Explaining the Gibbs sampler. The American Statistician 46(3), 167–174 (1992)

work page 1992

[11] [11]

A Unified Particle-Optimization Framework for Scalable Bayesian Sampling

Chen, C., Zhang, R., Wang, W., Li, B., Chen, L.: A uniﬁed particle-optimization framework for scalable Bayesian sampling. arXiv preprint arXiv:1805.11659 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

arXiv preprint arXiv:1901.08659 (2019)

Chen, P., Wu, K., Chen, J., O’Leary-Roseberry, T., Ghat- tas, O.: Projected stein variational newton: A fast and scalable Bayesian inference method in high dimensions. arXiv preprint arXiv:1901.08659 (2019)

work page arXiv 1901

[13] [13]

In: Artiﬁcial Intelligence and Statistics, pp

Dai, B., He, N., Dai, H., Song, L.: Provable bayesian infer- ence via particle mirror descent. In: Artiﬁcial Intelligence and Statistics, pp. 985–994 (2016)

work page 2016

[14] [14]

Degond, P., Mustieles, F.J.: A deterministic approxima- tion of diﬀusion equations using particles. SIAM J. Sci. Comput. 11(2), 293–310 (1990)

work page 1990

[15] [15]

In: Ad- vances in Neural Information Processing Systems, pp

Detommaso, G., Cui, T., Marzouk, Y., Spantini, A., Sche- ichl, R.: A Stein variational Newton method. In: Ad- vances in Neural Information Processing Systems, pp. 9169–9179 (2018)

work page 2018

[16] [16]

The Phase Field Method for Geometric Moving Interfaces and Their Numerical Approximations

Du, Q., Feng, X.: The phase ﬁeld method for geometric moving interfaces and their numerical approximations. arXiv preprint arXiv:1902.04924 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1902

[17] [17]

Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)

work page 1987

[18] [18]

Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

work page 2011

[19] [19]

El Moselhy, T.A., Marzouk, Y.M.: Bayesian inference with optimal maps. J. Comput. Phys. 231(23), 7815– 7850 (2012)

work page 2012

[20] [20]

Evans, L.C., Savin, O., Gangbo, W.: Diﬀeomorphisms and nonlinear heat ﬂows. SIAM J. Math. Anal. 37(3), 737–751 (2005)

work page 2005

[21] [21]

In: Inter- national Symposium on Applied Stochastic Models and Data Analysis, pp

Francois, D., Wertz, V., Verleysen, M., et al.: About the locality of kernels in high-dimensional spaces. In: Inter- national Symposium on Applied Stochastic Models and Data Analysis, pp. 238–245. Citeseer (2005)

work page 2005

[22] [22]

Approximate inference with Wasserstein gradient flows

Frogner, C., Poggio, T.: Approximate inference with Wasserstein gradient ﬂows. arXiv preprint arXiv:1806.04542 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Chapman and Hall/CRC (2013)

Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Ve- htari, A., Rubin, D.B.: Bayesian data analysis. Chapman and Hall/CRC (2013)

work page 2013

[24] [24]

IEEE Trans

Geman, S., Geman, D.: Stochastic relaxation, Gibbs dis- tributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. P AMI-6(6), 721–741 (1984)

work page 1984

[25] [25]

In: Proceedings of the 29th International Coference on International Conference on Machine Learning, pp

Gershman, S.J., Hoﬀman, M.D., Blei, D.M.: Nonpara- metric variational inference. In: Proceedings of the 29th International Coference on International Conference on Machine Learning, pp. 235–242 (2012)

work page 2012

[26] [26]

Handbook of Mathematical Analysis in Mechanics of Viscous Fluids pp

Giga, M.H., Kirshtein, A., Liu, C.: Variational modeling and complex ﬂuids. Handbook of Mathematical Analysis in Mechanics of Viscous Fluids pp. 1–41 (2017)

work page 2017

[27] [27]

Cambridge University Press (2008)

Gonzalez, O., Stuart, A.M.: A ﬁrst course in continuum mechanics. Cambridge University Press (2008)

work page 2008

[28] [28]

Computational Statistics 14(3), 375–396 (1999)

Haario, H., Saksman, E., Tamminen, J.: Adaptive pro- posal distribution for random walk Metropolis algorithm. Computational Statistics 14(3), 375–396 (1999)

work page 1999

[29] [29]

Biometrika 57(1), 97–109 (1970)

Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)

work page 1970

[30] [30]

Hohenberg, P.C., Halperin, B.I.: Theory of dynamic crit- ical phenomena. Rev. Mod. Phys. 49(3), 435 (1977)

work page 1977

[31] [31]

Iserles, A.: A ﬁrst course in the numerical analysis of diﬀerential equations. No. 44 in Cambridge Texts in Applied Mathematics. Cambridge university press, New York (2009)

work page 2009

[32] [32]

Machine Learning 37(2), 183–233 (1999)

Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Machine Learning 37(2), 183–233 (1999)

work page 1999

[33] [33]

Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)

work page 1998

[34] [34]

In: Advances in Neural Information Processing Systems, pp

Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive ﬂow. In: Advances in Neural Information Processing Systems, pp. 4743–4751 (2016)

work page 2016

[35] [35]

In: ESAIM: Proceedings, vol

Lacombe, G., Mas-Gallic, S.: Presentation and analysis of a diﬀusion-velocity method. In: ESAIM: Proceedings, vol. 7, pp. 225–233. EDP Sciences (1999)

work page 1999

[36] [36]

arXiv preprint arXiv:1902.03394 (2019)

Li, L., Liu, J.G., Liu, Z., Lu, J.: A stochastic version of Stein variational gradient descent for eﬃcient sampling. arXiv preprint arXiv:1902.03394 (2019)

work page arXiv 1902

[37] [37]

In: Multi-Scale Phenomena in Complex Fluids: Modeling, Analysis and Numerical Simulation, pp

Liu, C.: An introduction of elastic complex ﬂuids: an en- ergetic variational approach. In: Multi-Scale Phenomena in Complex Fluids: Modeling, Analysis and Numerical Simulation, pp. 286–337. World Scientiﬁc (2009)

work page 2009

[38] [38]

Journal of Computational Physics p

Liu, C., Wang, Y.: On Lagrangian schemes for porous medium type generalized diﬀusion equations: a discrete energetic variational approach. Journal of Computational Physics p. 109566 (2020)

work page 2020

[39] [39]

arXiv preprint arXiv:2003.10413 (2020)

Liu, C., Wang, Y.: A variational Lagrangian scheme for a phase ﬁeld model: A discrete energetic variational ap- proach. arXiv preprint arXiv:2003.10413 (2020)

work page arXiv 2003

[40] [40]

In: Thirty-Second AAAI Conference on Artiﬁcial Intelligence (2018) Particle-based Energetic Variational Inference 17

Liu, C., Zhu, J.: Riemannian Stein variational gradient descent for Bayesian inference. In: Thirty-Second AAAI Conference on Artiﬁcial Intelligence (2018) Particle-based Energetic Variational Inference 17

work page 2018

[41] [41]

In: International Conference on Machine Learning, pp

Liu, C., Zhuo, J., Cheng, P., Zhang, R., Zhu, J.: Under- standing and accelerating particle-based variational infer- ence. In: International Conference on Machine Learning, pp. 4082–4092 (2019)

work page 2019

[42] [42]

In: Advances in Neural Information Processing Sys- tems, pp

Liu, Q.: Stein variational gradient descent as gradient ﬂow. In: Advances in Neural Information Processing Sys- tems, pp. 3115–3123 (2017)

work page 2017

[43] [43]

In: Ad- vances in Neural Information Processing Systems, pp

Liu, Q., Wang, D.: Stein variational gradient descent: A general purpose Bayesian inference algorithm. In: Ad- vances in Neural Information Processing Systems, pp. 2378–2386 (2016)

work page 2016

[44] [44]

Lu, J., Lu, Y., Nolen, J.: Scaling limit of the Stein varia- tional gradient descent: The mean ﬁeld regime. SIAM J. Math. Anal. 51(2), 648–671 (2019)

work page 2019

[45] [45]

Cambridge university press (2003)

MacKay, D.J., Mac Kay, D.J.: Information theory, infer- ence and learning algorithms. Cambridge university press (2003)

work page 2003

[46] [46]

ESAIM: Mathematical Modelling and Numerical Analysis 53(1), 145–172 (2019)

Matthes, D., Plazotta, S.: A variational formulation of the BDF2 method for metric gradient ﬂows. ESAIM: Mathematical Modelling and Numerical Analysis 53(1), 145–172 (2019)

work page 2019

[47] [47]

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087– 1092 (1953)

work page 1953

[48] [48]

In: Neu- ral networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop, pp

Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Neu- ral networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop, pp. 41–48. Ieee (1999)

work page 1999

[49] [49]

MIT press (2012)

Murphy, K.P.: Machine learning: a probabilistic perspec- tive. MIT press (2012)

work page 2012

[50] [50]

Department of Computer Science, University of Toronto Toronto, Ontario, Canada (1993)

Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods. Department of Computer Science, University of Toronto Toronto, Ontario, Canada (1993)

work page 1993

[51] [51]

In: Learning in Graphical Models, pp

Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justiﬁes incremental, sparse, and other variants. In: Learning in Graphical Models, pp. 355–368. Springer (1998)

work page 1998

[52] [52]

Onsager, L.: Reciprocal relations in irreversible processes. I. Phys. Rev. 37(4), 405 (1931)

work page 1931

[53] [53]

Onsager, L.: Reciprocal relations in irreversible processes. II. Phys. Rev. 38(12), 2265 (1931)

work page 1931

[54] [54]

arXiv preprint arXiv:1912.02762 (2019)

Papamakarios, G., Nalisnick, E., Rezende, D.J., Mo- hamed, S., Lakshminarayanan, B.: Normalizing ﬂows for probabilistic modeling and inference. arXiv preprint arXiv:1912.02762 (2019)

work page arXiv 1912

[55] [55]

Nuclear Physics B 180(3), 378–384 (1981)

Parisi, G.: Correlation functions and computer simula- tions. Nuclear Physics B 180(3), 378–384 (1981)

work page 1981

[56] [56]

Proceedings of the London Mathematical Society 1(1), 119–124 (1873)

Rayleigh, L.: Note on the numerical calculation of the roots of ﬂuctuating functions. Proceedings of the London Mathematical Society 1(1), 119–124 (1873)

work page

[57] [57]

Variational Inference with Normalizing Flows

Rezende, D.J., Mohamed, S.: Variational inference with normalizing ﬂows. arXiv preprint arXiv:1505.05770 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[58] [58]

Bernoulli 2(4), 341–363 (1996)

Roberts, G.O., Tweedie, R.L., et al.: Exponential con- vergence of Langevin distributions and their discrete ap- proximations. Bernoulli 2(4), 341–363 (1996)

work page 1996

[59] [59]

Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

work page 1976

[60] [60]

Rossky, P.J., Doll, J.D., Friedman, H.L.: Brownian dy- namics as smart Monte Carlo simulation. J. Chem. Phys. 69(10), 4628–4633 (1978)

work page 1978

[61] [61]

In: International Conference on Machine Learning, pp

Salimans, T., Kingma, D., Welling, M.: Markov chain Monte Carlo and variational inference: Bridging the gap. In: International Conference on Machine Learning, pp. 1218–1226 (2015)

work page 2015

[62] [62]

Deep Diffeomorphic Normalizing Flows

Salman, H., Yadollahpour, P., Fletcher, T., Batmanghe- lich, K.: Deep diﬀeomorphic normalizing ﬂows. arXiv preprint arXiv:1810.03256 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[63] [63]

Santambrogio, F.: {Euclidean, metric, and Wasserstein} gradient ﬂows: an overview. Bull. Math. Sci 7(1), 87–154 (2017)

work page 2017

[64] [64]

The Journal of Machine Learning Research 20(1), 31–82 (2019)

Sonoda, S., Murata, N.: Transport analysis of inﬁnitely deep neural network. The Journal of Machine Learning Research 20(1), 31–82 (2019)

work page 2019

[65] [65]

Acta Numer

Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010)

work page 2010

[66] [66]

Tabak, E.G., Vanden-Eijnden, E., et al.: Density estima- tion by dual ascent of the log-likelihood. Commun. Math. Sci. 8(1), 217–233 (2010)

work page 2010

[67] [67]

Cambridge University Press (2005)

Temam, R., Miranville, A.: Mathematical modeling in continuum mechanics. Cambridge University Press (2005)

work page 2005

[68] [68]

Villani, C.: Optimal transport: old and new, vol. 338. Springer Science & Business Media (2008)

work page 2008

[69] [69]

Founda- tions and Trends ® in Machine Learning 1(1–2), 1–305 (2008)

Wainwright, M.J., Jordan, M.I., et al.: Graphical models, exponential families, and variational inference. Founda- tions and Trends ® in Machine Learning 1(1–2), 1–305 (2008)

work page 2008

[70] [70]

In: Ad- vances in Neural Information Processing Systems, pp

Wang, D., Tang, Z., Bajaj, C., Liu, Q.: Stein variational gradient descent with matrix-valued kernels. In: Ad- vances in Neural Information Processing Systems, pp. 7834–7844 (2019)

work page 2019

[71] [71]

In: Proceedings of the 28th International Conference on Machine Learning, pp

Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning, pp. 681– 688 (2011)

work page 2011