Accelerating Particle-based Energetic Variational Inference

Chun Liu; Lulu Kang; Xuelian Bao; Yiwei Wang

arxiv: 2504.03158 · v2 · pith:HQHEJTYDnew · submitted 2025-04-04 · 📊 stat.ML · cs.LG

Accelerating Particle-based Energetic Variational Inference

Xuelian Bao , Lulu Kang , Chun Liu , Yiwei Wang This is my paper

Pith reviewed 2026-05-22 21:51 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords particle-based variational inferenceenergetic variational inferenceenergy quadratizationoperator splittinggradient flowsKL divergence minimizationsampling algorithms

0 comments

The pith

A particle variational inference method uses energy quadratization and operator splitting to avoid repeated inter-particle calculations inside each time step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an accelerated particle-based method for energetic variational inference that starts from the same discretization-then-variation particle dynamics as the earlier implicit scheme. By inserting energy quadratization and an operator split, the new algorithm updates particles without recomputing pairwise interaction terms at every sub-step. This change lowers the per-iteration cost while the variational structure and a stability mechanism remain intact. A reader cares because standard particle variational inference methods become expensive precisely when the number of particles grows and each particle must interact with all others.

Core claim

The authors show that energy quadratization combined with operator splitting applied to the variational-preserving particle dynamics yields a scheme that drives particles toward the target distribution, retains a meaningful stability mechanism, and avoids repeated evaluation of inter-particle interaction terms within each time step, thereby reducing computational cost relative to the original implicit Euler discretization of EVI-Im.

What carries the argument

Energy quadratization and operator splitting applied to the discretization-then-variation particle dynamics.

If this is right

The algorithm achieves lower computational cost than EVI-Im by skipping repeated interaction evaluations inside each time step.
The method still drives particles toward the target distribution while keeping a stability mechanism.
Numerical experiments show competitive performance against existing particle variational inference approaches.
The same framework extends to other gradient-based sampling techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same splitting strategy might be tried on other implicit particle schemes that suffer from pairwise cost.
Larger time steps could become practical if the split steps remain stable, which would further reduce total wall-clock time.
The approach may translate to continuous-time formulations beyond discrete particle systems.

Load-bearing premise

Energy quadratization and operator splitting can be inserted into the variational-preserving particle dynamics without destroying the key variational properties or stability of the original implicit scheme.

What would settle it

A numerical test in which the new scheme produces a visibly different stationary distribution or loses stability at a step size where the original implicit method remains stable would falsify the preservation claim.

Figures

Figures reproduced from arXiv: 2504.03158 by Chun Liu, Lulu Kang, Xuelian Bao, Yiwei Wang.

**Figure 1.** Figure 1: ”Double-banana” (a), ”Star” (b) and ”Eight-component” (c) cases: particles obtained by the ImEQ method after 200 iterations (left); plot of MMD2 (middle) and KL divergence (right) with respect to CPU time for different methods. For AdaGrad and EVI-Im methods, lr = 0.1 in all cases. In the case of ImEQ method, lr = 0.01 for ”Double-banana” and ”Star” cases, while lr = 0.1 for ”Eight-component” case. For AEG… view at source ↗

**Figure 2.** Figure 2: (a): Particles obtained by the ImEQ method after 200 iterations with lr = 0.1 (up) and the AEGD method after 2000 iterations with lr = 0.1 (bottom). (b): KL divergence with respect to CPU time for different learning rates for the ImEQ. (c): KL divergence with respect to CPU time for different learning rates for the AEGD. applications, where the mean or variance of the target distribution is often unknown o… view at source ↗

**Figure 3.** Figure 3: “Star” case with the initial distribution set as a Gaussian distribution with a nonzero mean. (a)-(b): Particles obtained by AdaGrad and AEGD at iterations 500, 1000, 2000, and 5000 (from left to right). (c)-(d): Particles obtained by EVI-Im and ImEQ at iterations 20, 100, 200, and 500 (from left to right). (e)- (f ): Plots of MMD2 and KL divergence as functions of CPU time for different methods. for the a… view at source ↗

**Figure 4.** Figure 4: The train log likelihood and test accuracy of the “Diabetes” (a), “Image” (b) and “Covertype” (c) datasets returned by different methods. advantage over EVI-Im, achieving nearly the same log-likelihood with less CPU time. This is consistent with the results from the toy examples, where the ImEQ method outperforms EVI-Im when the particle number N ≥ 100. We then consider a large dataset “Covertype” [40], wh… view at source ↗

**Figure 5.** Figure 5: Boxplot of RMSE (left) and predictive Log-likelihood (right) for different datasets: (a) “Yacht Hydrodynamics”, (b) “Boston Housing”, and (c) “Concrete Data”. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

In this work, we propose a new particle-based variational inference (ParVI) method for accelerating the Energetic Variational Inference with Implicit scheme (EVI-Im) introduced in Ref. \cite{wang2021particle}. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, the proposed method efficiently drives particles towards the target distribution, while retaining a meaningful stability mechanism. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics obtained from a "discretization-then-variation" approach for minimizing the Kullback--Leibler divergence, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms within each time step, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that the proposed method achieves competitive performance compared with existing ParVI approaches, while offering advantages in efficiency and robustness in certain regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies energy quadratization and operator splitting to cut the per-step cost of EVI-Im particle dynamics, but the advance stays inside an existing subfield.

read the letter

The core move is to replace the implicit Euler solve in the original EVI-Im scheme with a quadratized energy plus splitting. This removes the need to recompute pairwise interaction terms at every inner iteration, which is the main source of the claimed speed-up. The abstract states that the new scheme still drives particles to the target while keeping a stability mechanism, and the experiments are said to show competitive accuracy with better runtime in some regimes. That efficiency claim is the only concrete new element relative to the 2021 Wang et al. paper it cites. The technique itself is standard in the gradient-flow literature, so the novelty is really the targeted application rather than a fresh theoretical device. On the positive side, the cost reduction is easy to understand and the framework is noted as extensible, which could be useful for other particle methods. The soft spot is that the abstract gives no derivation showing the split scheme still inherits the variational structure or the original stability bound; without that, it is unclear whether the acceleration comes at the price of weaker guarantees or just a heuristic trade-off. The experiments are described only at the level of “competitive performance,” with no detail on controls, run counts, or failure modes. This is the kind of incremental methods paper that people already working on particle variational inference might want to look at for implementation ideas. It is not broad enough or theoretically tight enough to change how the wider field thinks about variational inference. I would send it to review because the efficiency claim is testable and the method is reproducible in principle, but I would expect referees to press hard on the preservation of the variational properties.

Referee Report

0 major / 2 minor

Summary. The paper proposes an accelerated particle-based variational inference (ParVI) method for the Energetic Variational Inference with Implicit scheme (EVI-Im). It applies energy quadratization and operator splitting to the variational-preserving particle dynamics obtained from a discretization-then-variation approach for KL divergence minimization. The method claims to avoid repeated evaluation of inter-particle interaction terms within each time step, thereby reducing computational cost while efficiently driving particles to the target distribution and retaining a stability mechanism. Numerical experiments demonstrate competitive performance relative to existing ParVI approaches, with suggested extensibility to other gradient-based sampling techniques.

Significance. If the central claims hold—specifically that the proposed acceleration preserves the variational structure and stability of the original EVI-Im without introducing hidden parameters or circularity—this would represent a practical advance in scalable particle-based sampling. The explicit focus on computational efficiency via operator splitting, combined with the extensibility claim, addresses a recurring bottleneck in ParVI methods and could enable broader adoption in high-dimensional inference tasks.

minor comments (2)

The abstract references numerical experiments but provides no details on the specific test distributions, dimensions, or baseline methods used; adding a sentence summarizing the experimental setup would improve clarity for readers.
Notation for the energy quadratization step and the splitting operator could be introduced with a brief equation reference in the introduction to aid readers unfamiliar with the cited EVI-Im work.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of the manuscript and for recognizing the potential practical advance offered by the proposed acceleration of EVI-Im. We are encouraged by the note that the focus on computational efficiency via operator splitting addresses a recurring bottleneck in ParVI methods. Below we respond to the major comments; since the provided report lists no specific major comments under that heading, the point-by-point section is empty. We remain available to address any additional points the referee or editor may raise.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation applies standard energy quadratization and operator splitting to the existing EVI-Im particle dynamics (cited from prior work) to obtain an accelerated scheme. No step reduces a claimed prediction or stability property to a fitted parameter or self-defined quantity by construction; the cost reduction follows directly from avoiding repeated inter-particle evaluations per the splitting. Numerical experiments supply independent empirical checks. The self-citation to the base EVI-Im method is not load-bearing for the acceleration claim itself, and the framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no specific free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5689 in / 1006 out tokens · 40899 ms · 2026-05-22T21:51:51.937458+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a new algorithm called ImEQ ... which integrates the energy quadratization technique into gradient flows ... only applies energetic quadratization to some part of the free energy instead of its entirety
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

˜F (z, r) = r² + H(z) ... unconditionally energy stable in terms of the modified energy

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Arbel, A

M. Arbel, A. Korba, A. Salim, and A. Gretton , Maximum mean discrepancy gradient flow , in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc., 2019, pp. 6484–6494

work page 2019
[2]

Barzilai and J

J. Barzilai and J. M. Borwein , Two-point step size gradient methods , IMA J. Numer. Anal., 8 (1988), pp. 141–148

work page 1988
[3]

D. M. Blei, A. Kucukelbir, and J. D. McAuliffe , Variational inference: A review for statisticians , J. Am. Stat. Assoc., 112 (2017), pp. 859–877

work page 2017
[4]

J. A. Carrillo, K. Craig, and F. S. Patacchini , A blob method for diffusion , Calc. Var. Partial. Differ. Equ., 58, 53 pp. (2019)

work page 2019
[5]

J. A. Carrillo, S. Jin, L. Li, and Y. Zhu , A consensus-based global optimization method for high dimensional machine learning problems , ESAIM: COCV, 27, Paper No. S5, 22pp. (2021)

work page 2021
[6]

Casella and E

G. Casella and E. I. George , Explaining the Gibbs sampler , The American Statistician, 46 (1992), pp. 167–174

work page 1992
[7]

C. Chen, R. Zhang, W. W ang, B. Li, and L. Chen , A unified particle-optimization framework for scal- able bayesian sampling , in Conference on Uncertainty in Artificial Intelligence, Monterey, California, USA, 2018, 10pp

work page 2018
[8]

P. Chen, K. Wu, J. Chen, T. O’Leary-Roseberry, and O. Ghattas , Projected stein variational newton: A fast and scalable Bayesian inference method in high dimensions , in 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, 2019, 10pp

work page 2019
[9]

S. Chen, Z. Ding, and Q. Li , Bayesian sampling using interacting particles, in Active Particles, Volume 4, Springer, 2024, pp. 175–215

work page 2024
[10]

Y. Chen, Y. W ang, L. Kang, and C. Liu , A deterministic sampling method via maximum mean discrepancy flow with adaptive kernel , preprint, arXiv:2111.10722v2, (2022)

work page arXiv 2022
[11]

Detommaso, T

G. Detommaso, T. Cui, Y. Marzouk, A. Spantini, and R. Scheichl , A Stein variational Newton method, in 32nd Conference on Neural Information Processing Systems, Montr´ eal, Canada, 2018, pp. 9169–9179

work page 2018
[12]

Duane, A

S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth , Hybrid Monte Carlo , Phys. Lett. B, 19 195 (1987), pp. 216–222

work page 1987
[13]

Duchi, E

J. Duchi, E. Hazan, and Y. Singer , Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12 (2011), pp. 2121–2159

work page 2011
[14]

W. E, C. Ma, and L. Wu , Machine learning from a continuous viewpoint, i, Sci. China Math., 63 (2020), pp. 2233–2266

work page 2020
[15]

Geman and D

S. Geman and D. Geman , Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., PAMI-6 (1984), pp. 721–741

work page 1984
[16]

M.-H. Giga, A. Kirshtein, and C. Liu , Variational modeling and complex fluids , Handbook of Math- ematical Analysis in Mechanics of Viscous Fluids, (2017), pp. 1–41

work page 2017
[17]

Gretton, K

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch ¨olkopf, and A. Smola , A kernel two-sample test, J. Mach. Learn. Res., 13 (2012), pp. 723–773

work page 2012
[18]

W. K. Hastings , Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57 (1970), pp. 97–109

work page 1970
[19]

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul , An introduction to variational methods for graphical models, Machine Learning, 37 (1999), pp. 183–233

work page 1999
[20]

Liu and J

C. Liu and J. Zhu , Riemannian Stein variational gradient descent for Bayesian inference, in Proceedings of the AAAI Conference on Artificial Intelligence, Volume 32, 2018, pp. 3627–3634

work page 2018
[21]

C. Liu, J. Zhuo, P. Cheng, R. Zhang, and J. Zhu , Understanding and accelerating particle-based variational inference, in Proceedings of the 36th International Conference on Machine Learning, 2019, pp. 4082–4092

work page 2019
[22]

H. Liu, L. Nurbekyan, X. Tian, and Y. Yang , Adaptive preconditioned gradient descent with energy, arXiv preprint arXiv:2310.06733, (2023)

work page arXiv 2023
[23]

Liu and X

H. Liu and X. Tian , An adaptive gradient method with energy and momentum , Ann. Appl. Math., 38 (2022), pp. 183–222

work page 2022
[24]

Liu and X

H. Liu and X. Tian , Dynamic behavior for a gradient algorithm with energy and momentum , arXiv preprint arXiv:2203.12199, (2022)

work page arXiv 2022
[25]

Liu and X

H. Liu and X. Tian , AEGD: Adaptive gradient descent with energy , Numer. Algebra, Control. Optim., 15 (2025), pp. 315–340

work page 2025
[26]

Liu , Stein variational gradient descent as gradient flow , in 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp

Q. Liu , Stein variational gradient descent as gradient flow , in 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 3115–3123

work page 2017
[27]

Liu and D

Q. Liu and D. W ang, Stein variational gradient descent: A general purpose Bayesian inference algorithm, in 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2378–2386

work page 2016
[28]

Metropolis, A

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller , Equation of state calculations by fast computing machines , J. Chem. Phys., 21 (1953), pp. 1087–1092

work page 1953
[29]

R. M. Neal , Probabilistic inference using Markov chain Monte Carlo methods , Department of Computer Science, University of Toronto Toronto, Ontario, Canada, 1993

work page 1993
[30]

R. M. Neal and G. E. Hinton , A view of the EM algorithm that justifies incremental, sparse, and other variants, in Learning in Graphical Models, Vol. 89, Springer, 1998, pp. 355–368

work page 1998
[31]

Papamakarios, E

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan , Nor- malizing flows for probabilistic modeling and inference , J. Mach. Learn. Res., 22 (2021), pp. 1–64

work page 2021
[32]

Parisi, Correlation functions and computer simulations , Nucl

G. Parisi, Correlation functions and computer simulations , Nucl. Phys. B, 180 (1981), pp. 378–384

work page 1981
[33]

Reich and S

S. Reich and S. Weissmann , Fokker–planck particle systems for bayesian inference: Computational approaches, SIAM/ASA J. Uncertain., 9 (2021), pp. 446–482

work page 2021
[34]

D. J. Rezende and S. Mohamed , Variational inference with normalizing flows , in Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015, pp. 1530–1538

work page 2015
[35]

G. O. Roberts, R. L. Tweedie, et al. , Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2 (1996), pp. 341–363

work page 1996
[36]

P. J. Rossky, J. D. Doll, and H. L. Friedman , Brownian dynamics as smart Monte Carlo simulation , J. Chem. Phys., 69 (1978), pp. 4628–4633

work page 1978
[37]

Rotskoff and E

G. Rotskoff and E. V anden-Eijnden , Trainability and accuracy of artificial neural networks: An interacting particle system approach, Commun. Pure Appl. Math., 75 (2022), pp. 1889–1935

work page 2022
[38]

J. Shen, J. Xu, and J. Yang , The scalar auxiliary variable (sav) approach for gradient flows, J. Comput. Phys., 353 (2018), pp. 407–416

work page 2018
[39]

M. J. W ainwright and M. I. Jordan , Graphical Models, Exponential Families, and Variational Infer- 20 ence, Now Foundations and Trends, 2008

work page 2008
[40]

W ang, Z

D. W ang, Z. Tang, C. Bajaj, and Q. Liu , Stein variational gradient descent with matrix-valued kernels, in Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 7836–7846

work page 2019
[41]

W ang, J

Y. W ang, J. Chen, C. Liu, and L. Kang , Particle-based energetic variational inference, Stat. Comput., 31, Paper No. 34, 17pp. (2021)

work page 2021
[42]

W ang and C

Y. W ang and C. Liu , Some recent advances in energetic variational approaches, Entropy, 24, Paper No. 721, 26 pp. (2022)

work page 2022
[43]

Welling and Y

M. Welling and Y. W. Teh , Bayesian learning via stochastic gradient Langevin dynamics , in Proceed- ings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 2011, pp. 681– 688

work page 2011
[44]

Yang, Linear, first and second-order, unconditionally energy stable numerical schemes for the phase field model of homopolymer blends , J

X. Yang, Linear, first and second-order, unconditionally energy stable numerical schemes for the phase field model of homopolymer blends , J. Comput. Phys., 327 (2016), pp. 294–316

work page 2016
[45]

Zhang, S

J. Zhang, S. Zhang, J. Shen, and G. Lin , Energy-dissipative evolutionary deep operator neural net- works, J. Comput. Phys., 498, Paper No. 112638, 17pp. (2024)

work page 2024
[46]

J. Zhao, Q. W ang, and X. Yang , Numerical approximations for a phase field dendritic crystal growth model based on the invariant energy quadratization approach , Internat. J. Numer. Methods Engrg., 110 (2017), pp. 279–300. 21

work page 2017

[1] [1]

Arbel, A

M. Arbel, A. Korba, A. Salim, and A. Gretton , Maximum mean discrepancy gradient flow , in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc., 2019, pp. 6484–6494

work page 2019

[2] [2]

Barzilai and J

J. Barzilai and J. M. Borwein , Two-point step size gradient methods , IMA J. Numer. Anal., 8 (1988), pp. 141–148

work page 1988

[3] [3]

D. M. Blei, A. Kucukelbir, and J. D. McAuliffe , Variational inference: A review for statisticians , J. Am. Stat. Assoc., 112 (2017), pp. 859–877

work page 2017

[4] [4]

J. A. Carrillo, K. Craig, and F. S. Patacchini , A blob method for diffusion , Calc. Var. Partial. Differ. Equ., 58, 53 pp. (2019)

work page 2019

[5] [5]

J. A. Carrillo, S. Jin, L. Li, and Y. Zhu , A consensus-based global optimization method for high dimensional machine learning problems , ESAIM: COCV, 27, Paper No. S5, 22pp. (2021)

work page 2021

[6] [6]

Casella and E

G. Casella and E. I. George , Explaining the Gibbs sampler , The American Statistician, 46 (1992), pp. 167–174

work page 1992

[7] [7]

C. Chen, R. Zhang, W. W ang, B. Li, and L. Chen , A unified particle-optimization framework for scal- able bayesian sampling , in Conference on Uncertainty in Artificial Intelligence, Monterey, California, USA, 2018, 10pp

work page 2018

[8] [8]

P. Chen, K. Wu, J. Chen, T. O’Leary-Roseberry, and O. Ghattas , Projected stein variational newton: A fast and scalable Bayesian inference method in high dimensions , in 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, 2019, 10pp

work page 2019

[9] [9]

S. Chen, Z. Ding, and Q. Li , Bayesian sampling using interacting particles, in Active Particles, Volume 4, Springer, 2024, pp. 175–215

work page 2024

[10] [10]

Y. Chen, Y. W ang, L. Kang, and C. Liu , A deterministic sampling method via maximum mean discrepancy flow with adaptive kernel , preprint, arXiv:2111.10722v2, (2022)

work page arXiv 2022

[11] [11]

Detommaso, T

G. Detommaso, T. Cui, Y. Marzouk, A. Spantini, and R. Scheichl , A Stein variational Newton method, in 32nd Conference on Neural Information Processing Systems, Montr´ eal, Canada, 2018, pp. 9169–9179

work page 2018

[12] [12]

Duane, A

S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth , Hybrid Monte Carlo , Phys. Lett. B, 19 195 (1987), pp. 216–222

work page 1987

[13] [13]

Duchi, E

J. Duchi, E. Hazan, and Y. Singer , Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12 (2011), pp. 2121–2159

work page 2011

[14] [14]

W. E, C. Ma, and L. Wu , Machine learning from a continuous viewpoint, i, Sci. China Math., 63 (2020), pp. 2233–2266

work page 2020

[15] [15]

Geman and D

S. Geman and D. Geman , Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., PAMI-6 (1984), pp. 721–741

work page 1984

[16] [16]

M.-H. Giga, A. Kirshtein, and C. Liu , Variational modeling and complex fluids , Handbook of Math- ematical Analysis in Mechanics of Viscous Fluids, (2017), pp. 1–41

work page 2017

[17] [17]

Gretton, K

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch ¨olkopf, and A. Smola , A kernel two-sample test, J. Mach. Learn. Res., 13 (2012), pp. 723–773

work page 2012

[18] [18]

W. K. Hastings , Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57 (1970), pp. 97–109

work page 1970

[19] [19]

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul , An introduction to variational methods for graphical models, Machine Learning, 37 (1999), pp. 183–233

work page 1999

[20] [20]

Liu and J

C. Liu and J. Zhu , Riemannian Stein variational gradient descent for Bayesian inference, in Proceedings of the AAAI Conference on Artificial Intelligence, Volume 32, 2018, pp. 3627–3634

work page 2018

[21] [21]

C. Liu, J. Zhuo, P. Cheng, R. Zhang, and J. Zhu , Understanding and accelerating particle-based variational inference, in Proceedings of the 36th International Conference on Machine Learning, 2019, pp. 4082–4092

work page 2019

[22] [22]

H. Liu, L. Nurbekyan, X. Tian, and Y. Yang , Adaptive preconditioned gradient descent with energy, arXiv preprint arXiv:2310.06733, (2023)

work page arXiv 2023

[23] [23]

Liu and X

H. Liu and X. Tian , An adaptive gradient method with energy and momentum , Ann. Appl. Math., 38 (2022), pp. 183–222

work page 2022

[24] [24]

Liu and X

H. Liu and X. Tian , Dynamic behavior for a gradient algorithm with energy and momentum , arXiv preprint arXiv:2203.12199, (2022)

work page arXiv 2022

[25] [25]

Liu and X

H. Liu and X. Tian , AEGD: Adaptive gradient descent with energy , Numer. Algebra, Control. Optim., 15 (2025), pp. 315–340

work page 2025

[26] [26]

Liu , Stein variational gradient descent as gradient flow , in 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp

Q. Liu , Stein variational gradient descent as gradient flow , in 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 3115–3123

work page 2017

[27] [27]

Liu and D

Q. Liu and D. W ang, Stein variational gradient descent: A general purpose Bayesian inference algorithm, in 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2378–2386

work page 2016

[28] [28]

Metropolis, A

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller , Equation of state calculations by fast computing machines , J. Chem. Phys., 21 (1953), pp. 1087–1092

work page 1953

[29] [29]

R. M. Neal , Probabilistic inference using Markov chain Monte Carlo methods , Department of Computer Science, University of Toronto Toronto, Ontario, Canada, 1993

work page 1993

[30] [30]

R. M. Neal and G. E. Hinton , A view of the EM algorithm that justifies incremental, sparse, and other variants, in Learning in Graphical Models, Vol. 89, Springer, 1998, pp. 355–368

work page 1998

[31] [31]

Papamakarios, E

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan , Nor- malizing flows for probabilistic modeling and inference , J. Mach. Learn. Res., 22 (2021), pp. 1–64

work page 2021

[32] [32]

Parisi, Correlation functions and computer simulations , Nucl

G. Parisi, Correlation functions and computer simulations , Nucl. Phys. B, 180 (1981), pp. 378–384

work page 1981

[33] [33]

Reich and S

S. Reich and S. Weissmann , Fokker–planck particle systems for bayesian inference: Computational approaches, SIAM/ASA J. Uncertain., 9 (2021), pp. 446–482

work page 2021

[34] [34]

D. J. Rezende and S. Mohamed , Variational inference with normalizing flows , in Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015, pp. 1530–1538

work page 2015

[35] [35]

G. O. Roberts, R. L. Tweedie, et al. , Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2 (1996), pp. 341–363

work page 1996

[36] [36]

P. J. Rossky, J. D. Doll, and H. L. Friedman , Brownian dynamics as smart Monte Carlo simulation , J. Chem. Phys., 69 (1978), pp. 4628–4633

work page 1978

[37] [37]

Rotskoff and E

G. Rotskoff and E. V anden-Eijnden , Trainability and accuracy of artificial neural networks: An interacting particle system approach, Commun. Pure Appl. Math., 75 (2022), pp. 1889–1935

work page 2022

[38] [38]

J. Shen, J. Xu, and J. Yang , The scalar auxiliary variable (sav) approach for gradient flows, J. Comput. Phys., 353 (2018), pp. 407–416

work page 2018

[39] [39]

M. J. W ainwright and M. I. Jordan , Graphical Models, Exponential Families, and Variational Infer- 20 ence, Now Foundations and Trends, 2008

work page 2008

[40] [40]

W ang, Z

D. W ang, Z. Tang, C. Bajaj, and Q. Liu , Stein variational gradient descent with matrix-valued kernels, in Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 7836–7846

work page 2019

[41] [41]

W ang, J

Y. W ang, J. Chen, C. Liu, and L. Kang , Particle-based energetic variational inference, Stat. Comput., 31, Paper No. 34, 17pp. (2021)

work page 2021

[42] [42]

W ang and C

Y. W ang and C. Liu , Some recent advances in energetic variational approaches, Entropy, 24, Paper No. 721, 26 pp. (2022)

work page 2022

[43] [43]

Welling and Y

M. Welling and Y. W. Teh , Bayesian learning via stochastic gradient Langevin dynamics , in Proceed- ings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 2011, pp. 681– 688

work page 2011

[44] [44]

Yang, Linear, first and second-order, unconditionally energy stable numerical schemes for the phase field model of homopolymer blends , J

X. Yang, Linear, first and second-order, unconditionally energy stable numerical schemes for the phase field model of homopolymer blends , J. Comput. Phys., 327 (2016), pp. 294–316

work page 2016

[45] [45]

Zhang, S

J. Zhang, S. Zhang, J. Shen, and G. Lin , Energy-dissipative evolutionary deep operator neural net- works, J. Comput. Phys., 498, Paper No. 112638, 17pp. (2024)

work page 2024

[46] [46]

J. Zhao, Q. W ang, and X. Yang , Numerical approximations for a phase field dendritic crystal growth model based on the invariant energy quadratization approach , Internat. J. Numer. Methods Engrg., 110 (2017), pp. 279–300. 21

work page 2017