pith. sign in

arxiv: 2604.11519 · v1 · submitted 2026-04-13 · 💻 cs.LG · math-ph· math.MP

Generative Path-Finding Method for Wasserstein Gradient Flow

Pith reviewed 2026-05-10 15:27 UTC · model grok-4.3

classification 💻 cs.LG math-phmath.MP
keywords Wasserstein gradient flowgenerative path findingnormalizing flowslarge deviation theoryFokker-Planckaction functionalprobability transport
0
0 comments X

The pith

A generative framework learns paths in Wasserstein space by minimizing an action loss from large deviation theory using normalizing flows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GenWGP, a method that computes the full trajectory of a probability distribution evolving under Wasserstein gradient flow from an initial state to equilibrium. It does this by training a generative model to minimize a path loss that comes from a geometric action functional based on Dawson-Gartner large deviation theory. The approach uses normalizing flows to create a curve where the intrinsic speed is constant between layers, keeping successive distributions roughly equidistant in the Wasserstein metric. This setup allows the method to work with coarse discretizations of about a dozen points while remaining stable and independent of specific time or geometry choices. A reader would care because it offers a practical way to simulate these flows in high dimensions without the usual numerical difficulties of time stepping or particle methods.

Core claim

GenWGP learns a generative flow that transports mass from an initial density to an unknown equilibrium distribution by minimizing a path loss derived from a geometric action functional motivated by Dawson-Gartner large deviation theory for empirical distributions of interacting diffusion systems. Both a finite horizon action under physical time parametrization and a reparameterization invariant geometric action based on Wasserstein arclength are formulated. Using normalizing flows, it computes a geometric curve toward equilibrium while enforcing approximately constant intrinsic speed between adjacent network layers, so that discretized distributions remain nearly equidistant in the Wasserste

What carries the argument

The geometric action functional based on Wasserstein arclength, minimized via normalizing flows to enforce constant intrinsic speed between layers in the generative path.

If this is right

  • GenWGP matches or exceeds high fidelity reference solutions with only about a dozen discretization points.
  • It enables stable training largely independent of temporal or geometric discretization.
  • The method applies to Fokker-Planck and aggregation type problems while capturing complex dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The constant intrinsic speed enforcement could be adapted to other generative architectures for learning continuous dynamics in distribution space.
  • This suggests large deviation principles may guide loss design for approximating gradient flows without fine time discretization.
  • If it generalizes, GenWGP could offer a scalable alternative to particle simulations for high-dimensional equilibrium sampling.

Load-bearing premise

The path loss from Dawson-Gartner large deviation theory accurately encodes the Wasserstein gradient flow dynamics and terminal condition, with normalizing flows sufficiently expressive for the transport maps.

What would settle it

A direct comparison on a low-dimensional problem with known analytic Wasserstein gradient flow solution, verifying if the 12-point discretization produces equidistant Wasserstein distances and correct terminal distribution.

Figures

Figures reproduced from arXiv: 2604.11519 by Chengyu Liu, Xiang Zhou.

Figure 1
Figure 1. Figure 1: Transport map learned by Algorithm 2 with 𝐾 = 9 layers for the 2D isotropic Gaussian case. The flow map (indicated by the arrows) transports the initial density toward equilibrium while preserving isotropic structure. 0 1 2 3 4 5 6 7 8 Layer 0.4760 0.4765 0.4770 0.4775 0.4780 0.4785 0.4790 0.4795 0.4800 Arc length (a) Arc-length (segment norm) between each pair of neigh￾bouring layers 0 1 2 3 4 5 6 7 8 Lay… view at source ↗
Figure 2
Figure 2. Figure 2: Validation of the learned geometric path for the 2D isotropic Gaussian example. (a): nearly constant segment lengths indicate approximate arc-length parametrization; (b): cosine alignment close to one is consistent with the gradient￾flow direction; and (c) the density snapshots at two selected times agree well with the exact solution at the recovered physical times. C. Liu and X. Zhou: Preprint submitted t… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison between the physicaltime formulation and the geometric formulation for the 2D isotropic Gaussian example. (a): the uniform time mesh for each layer in Algorithm 1 and the recovered physical time mesh in Algorithm 2; (b): the decay of free energy plot in physical time; (c) the decay of free energy plot in layers; (d) the d errors. 2D anisotropic diffusion. For 𝜇 = [3, 3]⊤ and Σ = diag(1, 0.25), … view at source ↗
Figure 4
Figure 4. Figure 4: Comparison in the anisotropic Gaussian case. This example examines whether the method remains accurate in a moderate-dimensional setting with coupled and anisotropic substructures. The learned flow from Algorithm 2 captures the expected rotated and anisotropic components in the selected two-dimensional projections; see [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: 2D projections of the terminal distribution for the 10D (dimension 0 to 9) block-structured Gaussian example. 5.1.2. Non-Convex Potential: 10D Styblinski-Tang Potential We next consider the 10D Styblinski-Tang potential, given by a sum of identical one-dimensional potentials over each coordinate: 𝑉 (𝑥) = 3 50 (∑ 𝑑 𝑖=1 𝑥 4 𝑖 − 16𝑥 2 𝑖 + 5𝑥𝑖 ) , 𝑥 = (𝑥1 ,…, 𝑥10) ∈ ℝ 10 . The initial is the standard Gaussian … view at source ↗
Figure 6
Figure 6. Figure 6: Absolute error of the mean and Frobenius norm of the covariance error vs. recovered time in the 10D block￾structured Gaussian example. Parameterization and visualization. We parameterize the path by a non-uniform B-spline Flow [24] with two hidden layers (width 100, SiLU activation), which provides smooth 𝐶 2 -diffeomorphic transports with controlled regularity [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Sample points projected onto the (𝑥5 , 𝑥6 )-plane at each layer along the learned geometric path for the 10D Styblinski-Tang potential. Reference solution and comparison. Each one-dimensional marginal evolves independently according to the one￾dimensional Fokker-Planck equation, equivalently the over-damped Langevin SDE d𝑋𝑡 = −𝑉 ′ 1 (𝑋𝑡 )d𝑡 + √ 2d𝐵𝑡 , 𝑉1 (𝑋) = 3 50 ( 𝑋 4 − 16𝑋 2 + 5𝑋 ) , 𝑋𝑡 ∈ ℝ. Unlike the… view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of marginal densities from the learned path (colored for each component) and from 1D SDE simulation (black) at each layer, with the recovered physical times indicated. 5.2. Interacting Particle Dynamics We next consider WGFs driven by nonlocal interaction energies with the pairwise term 1 2 ∫ℝ𝑑×ℝ𝑑 𝑊 (𝑥 − 𝑦)𝜌(𝑥)𝜌(𝑦)d𝑥d𝑦. Such models arise in aggregation, swarming, and mean-field dynamics. In cont… view at source ↗
Figure 9
Figure 9. Figure 9: Pure aggregation: particle transport (top) and density evolution (bottom) along the learned geometric path. The terminal state approaches the uniform distribution supported on the unit disk. 0 1 2 3 4 5 6 7 Layer 0.00 0.02 0.04 0.06 0.08 R elativ e L2 Error [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Relative 𝐿2 difference between the learned densities and the reference solution computed by the primal-dual method. The explicit steady state [5] is the uniform distribution on the annulus with inner and outer radii 𝑅𝑖 = √𝛼1 𝛼2 , 𝑅𝑜 = √ 𝑅2 𝑖 + 1. Starting from a non-radially symmetric initial datum composed of five Gaussians, the learned geometric path restores radial symmetry and converges to the annular… view at source ↗
Figure 11
Figure 11. Figure 11: Aggregation-drift: particle evolution from an asymmetric initial distribution toward the annular steady state. We use this example to show how the geometric path obtained by Algorithm 2 can provide a better non-uniform time mesh as well as a good terminal time for the physical-time path minimization algorithm 1, as discussed in Remark 5. More precisely, we first compute a geometric path, then recover the … view at source ↗
Figure 12
Figure 12. Figure 12: Comparison between two physical-time discretizations for the aggregation-drift equation: the standard uniform￾time mesh and the recovered time mesh obtained from the geometric path. The recovered time allocates more layers to the fast transient regime and produces a smaller cumulative MAM loss. To further examine the structure of the two learned paths, we report in [PITH_FULL_IMAGE:figures/full_fig_p022_… view at source ↗
Figure 13
Figure 13. Figure 13: Structural diagnostics for the aggregation-drift equation under uniform time and recovered time discretizations. From left to right: free energy, cosine alignment with −∇d , and the Wasserstein distance between neighboring layers. 5.2.3. Aggregation-Diffusion Equation We conclude this section with an aggregation-diffusion model that combines nonlinear diffusion and nonlocal attraction: (𝜌) = ∫ℝ2 𝜈 𝑚 − … view at source ↗
Figure 14
Figure 14. Figure 14: Aggregation-diffusion: density snapshots along the learned geometric path. Multi-bump transient states gradually merge into a smooth radial steady state. 0 5 10 15 20 Physical Time t 4.0 3.5 3.0 2.5 2.0 1.5 1.0 Energy 0 1 2 3 4 5 6 7 8 9 10 11 Layer 4.0 3.5 3.0 2.5 2.0 1.5 1.0 Energy [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Free-energy profiles along the gradient flow path. Left: the evolution computed by a conventional primal-dual scheme with a constant time step size 0.5. Right: evolution along the learned geometric path with only 11 images. time within any specified finite time horizon, and the other is the parametrizationfree geometric formulation. The physicaltime formulation provides a horizonbased variational descript… view at source ↗
read the original abstract

Wasserstein gradient flows (WGFs) describe the evolution of probability distributions in Wasserstein space as steepest descent dynamics for a free energy functional. Computing the full path from an arbitrary initial distribution to equilibrium is challenging, especially in high dimensions. Eulerian methods suffer from the curse of dimensionality, while existing Lagrangian approaches based on particles or generative maps do not naturally improve efficiency through time step tuning. We propose GenWGP, a generative path finding framework for Wasserstein gradient paths. GenWGP learns a generative flow that transports mass from an initial density to an unknown equilibrium distribution by minimizing a path loss that encodes the full trajectory and its terminal equilibrium condition. The loss is derived from a geometric action functional motivated by Dawson Gartner large deviation theory for empirical distributions of interacting diffusion systems. We formulate both a finite horizon action under physical time parametrization and a reparameterization invariant geometric action based on Wasserstein arclength. Using normalizing flows, GenWGP computes a geometric curve toward equilibrium while enforcing approximately constant intrinsic speed between adjacent network layers, so that discretized distributions remain nearly equidistant in the Wasserstein metric along the path. This avoids delicate time stepping constraints and enables stable training that is largely independent of temporal or geometric discretization. Experiments on Fokker Planck and aggregation type problems show that GenWGP matches or exceeds high fidelity reference solutions with only about a dozen discretization points while capturing complex dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes GenWGP, a generative framework that learns normalizing-flow compositions to transport an initial density to equilibrium along a Wasserstein gradient flow path. It derives a path loss from the Dawson-Gartner large-deviation principle for empirical measures of interacting diffusions, formulates both a finite-horizon action and a reparameterization-invariant geometric action based on Wasserstein arclength, and enforces approximately constant intrinsic speed between layers so that discrete distributions remain nearly equidistant in the Wasserstein metric. Experiments on Fokker-Planck and aggregation problems report that the method recovers reference solutions using only about a dozen discretization points.

Significance. If the loss functional is shown to recover the exact WGF trajectory (continuity equation with velocity equal to the Wasserstein gradient of the free energy plus terminal equilibrium), the approach would provide a discretization-robust Lagrangian method for computing full high-dimensional paths, sidestepping the time-stepping sensitivities of existing particle or map-based schemes and exploiting the representational power of normalizing flows.

major comments (2)
  1. [Abstract / derivation of the loss] Abstract and derivation section: the claim that the path loss 'encodes the full trajectory and its terminal equilibrium condition' rests on the Dawson-Gartner LDP for empirical distributions of interacting diffusions, yet the manuscript provides no rigorous argument that the resulting variational problem is equivalent to the Wasserstein gradient flow in the continuum limit; in particular, fluctuation corrections and the non-commutativity of the mean-field and large-deviation limits are not controlled, which is load-bearing for the assertion that the learned curve satisfies the continuity equation with the correct velocity field.
  2. [Experiments] Experimental section: the statement that GenWGP 'matches or exceeds high fidelity reference solutions with only about a dozen discretization points' is presented without error bars, statistical significance tests, or systematic ablations on the number of layers/points; this weakens the supporting evidence for the claim of robustness to temporal and geometric discretization.
minor comments (2)
  1. [Method] The geometric action is described as 'reparameterization invariant' and 'based on Wasserstein arclength,' but the precise definition of the discrete arclength metric and how it is computed from the flow layers is not stated explicitly enough for reproduction.
  2. [Preliminaries] Notation for the free-energy functional and the interaction kernel in the aggregation examples is introduced without a dedicated table or appendix listing all symbols, making cross-referencing with the loss terms cumbersome.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We address each of the major comments point by point below.

read point-by-point responses
  1. Referee: [Abstract / derivation of the loss] Abstract and derivation section: the claim that the path loss 'encodes the full trajectory and its terminal equilibrium condition' rests on the Dawson-Gartner LDP for empirical distributions of interacting diffusions, yet the manuscript provides no rigorous argument that the resulting variational problem is equivalent to the Wasserstein gradient flow in the continuum limit; in particular, fluctuation corrections and the non-commutativity of the mean-field and large-deviation limits are not controlled, which is load-bearing for the assertion that the learned curve satisfies the continuity equation with the correct velocity field.

    Authors: The path loss is constructed directly from the Dawson-Gartner large-deviation rate functional for the empirical measure of interacting diffusions; its minimizer is the most probable path, which coincides with the Wasserstein gradient flow of the free energy in the mean-field scaling. The manuscript derives both the finite-horizon and geometric-action forms from this principle and shows that the resulting discrete trajectories satisfy the continuity equation with the correct velocity by construction. We acknowledge that a fully rigorous control of fluctuation corrections and the precise interchange of mean-field and large-deviation limits lies beyond the scope of the present work. We will revise the derivation section to state the limiting regime more precisely and to cite the relevant literature on the convergence of the LDP to the WGF, thereby clarifying the scope of the claim without altering the core argument. revision: partial

  2. Referee: [Experiments] Experimental section: the statement that GenWGP 'matches or exceeds high fidelity reference solutions with only about a dozen discretization points' is presented without error bars, statistical significance tests, or systematic ablations on the number of layers/points; this weakens the supporting evidence for the claim of robustness to temporal and geometric discretization.

    Authors: We agree that the experimental evidence would be more convincing with additional statistical support. In the revised manuscript we will include error bars computed over multiple independent training runs with different random seeds, report p-values or other statistical comparisons against the reference solutions where appropriate, and add systematic ablations that vary the number of layers (discretization points) while keeping all other hyperparameters fixed. These additions will directly address the concern about robustness to discretization. revision: yes

Circularity Check

0 steps flagged

No circularity; path loss derived from external Dawson-Gartner LDP, not from network outputs or self-referential definitions

full rationale

The paper's central construction begins with a path loss motivated by the Dawson-Gartner large-deviation principle for empirical measures of interacting diffusions—an external, pre-existing mathematical result independent of the normalizing-flow architecture or its parameters. This loss is then minimized over compositions of flow maps to obtain the generative path; the minimization itself is a standard variational procedure and does not redefine the loss in terms of the fitted outputs. No equation equates a derived quantity to a fitted parameter by construction, no uniqueness theorem is imported from the authors' prior work, and no ansatz is smuggled via self-citation. The geometric reparameterization and constant-speed enforcement follow directly from the Wasserstein arclength definition once the loss is accepted, without circular reduction. The derivation chain therefore remains self-contained against the external benchmark of large-deviation theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the geometric action functional derived from Dawson-Gartner large deviation theory and on the representational power of normalizing flows for the transport maps. No explicit free parameters or new invented entities are stated in the abstract.

axioms (1)
  • domain assumption The path loss encodes the full trajectory and its terminal equilibrium condition via the geometric action functional motivated by Dawson-Gartner large deviation theory for empirical distributions of interacting diffusion systems.
    This is the foundational motivation for the training objective in both finite-horizon and reparameterization-invariant forms.

pith-pipeline@v0.9.0 · 5548 in / 1409 out tokens · 43520 ms · 2026-05-10T15:27:30.610025+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

  1. [1]

    Adams, S., Dirr, N., Peletier, M., and Zimmer, J. (2013). Large deviations and gradient flows. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , 371(2005):20120341

  2. [2]

    Ambrosio, L., Gigli, N., and Savaré, G. (2008). Gradient flows: in metric spaces and in the space of probability measures . Springer Science & Business Media

  3. [3]

    Boffi, N. M. and Vanden-Eijnden, E. (2023). Probability flow solution of the Fokker–Planck equation. Machine Learning: Science and Technology, 4(3):035012

  4. [4]

    Boltzmann, L. (1872). Weitere studien über das wärmegleichgewicht unter gasmolekülen, volume 66. Aus der kk Hot-und Staatsdruckerei

  5. [5]

    Byun, S.-S. (2024). Planar equilibrium measure problem in the quadratic fields with a point charge. Computational Methods and Function Theory, 24(2):303–332

  6. [6]

    Cai, Z., Cao, Y., Huang, Y., and Zhou, X. (2026). Weak generative sampler to efficiently sample invariant distribution of stochastic differential equation. SIAM Journal on Scientific Computing (to appear. arXiv2405.19256 )

  7. [7]

    Cai, Z., Liu, C., and Zhou, X. (2025). Weak generative sampler for stationary distributions of mckean-vlasov system. arXiv preprint arXiv:2509.12841

  8. [8]

    A., Chertock, A., and Huang, Y

    Carrillo, J. A., Chertock, A., and Huang, Y. (2015). A finite-volume method for nonlinear nonlocal equations with a gradient flow structure. Communications in Computational Physics , 17(1):233–258

  9. [9]

    A., Craig, K., Wang, L., and Wei, C

    Carrillo, J. A., Craig, K., Wang, L., and Wei, C. (2022). Primal dual methods for wasserstein gradient flows. Foundations of Computational Mathematics, pages 1–55

  10. [10]

    A., McCann, R

    Carrillo, J. A., McCann, R. J., and Villani, C. (2003). Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Revista Matematica Iberoamericana, 19(3):971–1018

  11. [11]

    T., Rubanova, Y., Bettencourt, J., and Duvenaud, D

    Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. (2018). Neural ordinary differential equations.Advances in neural information processing systems, 31

  12. [12]

    Dawson, D. A. (1983). Critical dynamics and fluctuations for a mean-field model of cooperative behavior. Journal of Statistical Physics , 31(1):29–85

  13. [13]

    Dawson, D. A. and Gärtner, J. (1987). Large deviations from the McKean-Vlasov limit for weakly interacting diffusions. Stochastics, 20(4):247–308

  14. [14]

    Dawson, D. A. and Gärtner, J. (1989). Large deviations, free energy functional and quasi-potential for a mean field model of interacting diffusions, volume 78. Memoirs of the American Mathematical Society

  15. [15]

    Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016). Density estimation using Real NVP. In International Conference on Learning Representations

  16. [16]

    Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. (2019). Neural spline flows. Advances in neural information processing systems , 32

  17. [17]

    E, W., Ren, W., and Vanden-Eijnden, E. (2002). String method for the study of rare events. Phys. Rev. B, 66:052301

  18. [18]

    E, W., Ren, W., and Vanden-Eijnden, E. (2004). Minimum action method for the study of rare events. Comm. Pure Appl. Math., 57:637–656

  19. [19]

    and Kurtz, T

    Feng, J. and Kurtz, T. G. (2006). Large Deviations for Stochastic Processes, volume 131 of Mathematical Surveys and Monographs. American Mathematical Society, Prividence, RI

  20. [20]

    Freidlin, M. I. and Wentzell, A. D. (2012). Random Perturbations of Dynamical Systems . Grundlehren der mathematischen Wissenschaften. Springer-Verlag, New York, 3 edition

  21. [21]

    Han, J., Wu, Z., Gu, S., and Zhou, X. (2026). StringNET: Neural Network based Variational Method for Transition Pathways.Communications in Computational Physics

  22. [22]

    and Vanden-Eijnden, E

    Heymann, M. and Vanden-Eijnden, E. (2008a). The geometric minimum action method: A least action principle on the space of curves. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences , 61(8):1052–1117

  23. [23]

    and Vanden-Eijnden, E

    Heymann, M. and Vanden-Eijnden, E. (2008b). The geometric minimum action method: a least action principle on the space of curves. Comm. Pure Appl. Math., 61:1052–1117

  24. [24]

    and Chun, S

    Hong, S. and Chun, S. Y. (2023). Neural diffeomorphic non-uniform B-spline flows. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12225–12233

  25. [25]

    Hu, Z., Liu, C., Wang, Y., and Xu, Z. (2024). Energetic variational neural network discretizations of gradient flows.SIAM Journal on Scientific Computing, 46(4):A2528–A2556

  26. [26]

    Huang, H., Yu, J., Chen, J., and Lai, R. (2023). Bridging mean-field games and normalizing flows with trajectory regularization. Journal of Computational Physics, 487:112155

  27. [27]

    Huang, Y., Liu, C., and Zhou, X. (2026). Levy Score Function and Score-Based Particle Algorithm for Nonlinear Levy–Fokker–Planck Equations. SIAM Journal on Numerical Analysis (to appear), arXiv 2412.19520

  28. [28]

    and Dayan, P

    Hyvärinen, A. and Dayan, P. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4)

  29. [29]

    Jordan, R., Kinderlehrer, D., and Otto, F. (1998). The variational formulation of the Fokker–Planck equation. SIAM journal on mathematical analysis, 29(1):1–17

  30. [30]

    J., and Brubaker, M

    Kobyzev, I., Prince, S. J., and Brubaker, M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence , 43(11):3964–3979. C. Liu and X. Zhou: Preprint submitted to Elsevier Page 25 of 39 Generative Wasserstein Gradient Path Method

  31. [31]

    Lafferty, J. D. (1988). The density manifold and configuration space quantization. Transactions of the American Mathematical Society , 305(2):699–741

  32. [32]

    Lee, W., Wang, L., and Li, W. (2024). Deep JKO: time-implicit particle methods for general nonlinear gradient flows.Journal of Computational Physics, page 113187

  33. [33]

    Li, L., Hurault, S., and Solomon, J. (2023). Self-consistent velocity matching of probability flows. In Thirty-seventh Conference on Neural Information Processing Systems

  34. [34]

    Liu, S., Li, W., Zha, H., and Zhou, H. (2022). Neural parametric fokker–planck equation. SIAM Journal on Numerical Analysis , 60(3):1385– 1449

  35. [35]

    Lu, J., Wu, Y., and Xiang, Y. (2024). Score-based transport modeling for mean-field Fokker-Planck equations. Journal of Computational Physics, 503:112859

  36. [36]

    Nurbekyan, L., Lei, W., and Yang, Y. (2023). Efficient natural gradient descent methods for large-scale PDE-based optimization problems. SIAM Journal on Scientific Computing , 45(4):A1621–A1655

  37. [37]

    Onsager, L. (1931). Reciprocal relations in irreversible processes. i. Physical review, 37(4):405

  38. [38]

    Otto, F. (2001). The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations , 26:101–174

  39. [39]

    J., Mohamed, S., and Lakshminarayanan, B

    Papamakarios, G., Nalisnick, E., Rezende, D. J., Mohamed, S., and Lakshminarayanan, B. (2021). Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research , 22(57):1–64

  40. [40]

    Peletier, M. A. (2014). Variational modelling: Energies, gradient flows, and large deviations. arXiv preprint arXiv:1402.1990

  41. [41]

    Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics , 378:686–707

  42. [42]

    and Mohamed, S

    Rezende, D. and Mohamed, S. (2015). Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR

  43. [43]

    Rousset, M., Stoltz, G., and Lelievre, T. (2010). Free energy computations: a mathematical perspective . World Scientific

  44. [44]

    and Wang, Z

    Shen, Z. and Wang, Z. (2024). Entropy-dissipation informed neural network for Mckean-Vlasov type PDEs. Advances in Neural Information Processing Systems, 36

  45. [45]

    Simonnet, E. (2023). Computing non-equilibrium trajectories by a deep learning approach. Journal of Computational Physics , 491:112349

  46. [46]

    P., Kumar, A., Ermon, S., and Poole, B

    Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. (2021). Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations

  47. [47]

    and Zhou, X

    Sun, Y. and Zhou, X. (2018). An improved adaptive minimum action method for the calculation of transition path in non-gradient systems. Communications in Computational Physics , 24(1):44–68

  48. [48]

    Tang, K., Wan, X., and Liao, Q. (2022). Adaptive deep density approximation for Fokker-Planck equations.Journal of Computational Physics, 457:111080

  49. [49]

    and Heymann, M

    Vanden-Eijnden, E. and Heymann, M. (2008). The geometric minimum action method for computing minimum energy paths. J. Chem. Phys., 128:061103

  50. [50]

    Vázquez, J. L. (2007). The porous medium equation: mathematical theory . Oxford university press

  51. [51]

    Villani, C. (2009). Optimal transport: old and new, volume 338 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin

  52. [52]

    Villani, C. (2021). Topics in optimal transportation, volume 58. American Mathematical Soc

  53. [53]

    Wan, X. (2015). A minimum action method with optimal linear time scaling. Communications in Computational Physics , 18(5):1352–1379

  54. [54]

    Xie, H., Li, Z.-H., Wang, H., Zhang, L., and Wang, L. (2023). Deep variational free energy approach to dense hydrogen. Physical Review Letters, 131(12):126501

  55. [55]

    Xu, C., Cheng, X., and Xie, Y. (2023). Normalizing flow neural networks by JKO scheme. InThirty-seventh Conference on Neural Information Processing Systems

  56. [56]

    Yu, B. et al. (2018). The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12

  57. [57]

    Zhou, X., Ren, W., and E, W. (2008). Adaptive minimum action method for the study of rare events. J. Chem. Phys., 128(10):104111. A. Proofs in Section 3 A.1. Proof of Theorem 1 Assumption 1

  58. [58]

    The domain Ω is a bounded domain of finite measure (in particular, 𝕋 𝑑), and the boundary conditions are periodic or no-flux

  59. [59]

    The initial distribution 𝜌0 is absolutely continuous with respect to the Lebesgue measure (we still denote its density as 𝜌0) and there exists a positive constant 𝐶0 such that (𝐶0)−1 ≤ 𝜌0(𝑥) ≤ 𝐶0 for all 𝑥 ∈ Ω , and 𝜌0 ∈ 𝐶 2(Ω)

  60. [60]

    For every 𝑇 ≥ 0, the solution ̂ 𝑝𝑡 ∈ 𝐶 3(Ω) and there is a positive constant 𝐶 ∗ 𝑇 such that (𝐶 ∗ 𝑇 )−1 ≤ ̂ 𝑝𝑡(𝑥) ≤ 𝐶 ∗ 𝑇 for all 𝑥 ∈ Ω, 𝑡 ∈ [0, 𝑇 ]

  61. [61]

    For every 𝑇 ≥ 0, there is a positive constant 𝐶𝑇 such that sup(𝑡,𝑥)∈Ω×[0,𝑇 ] |∇ ⋅ 𝐟 (𝑥, 𝑡)| ≤ 𝐶𝑇

    The velocity field 𝐟𝑡(𝑥) = 𝐟 (𝑥, 𝑡) ∈ 𝐶 2,1(Ω × ℝ, ℝ𝑑). For every 𝑇 ≥ 0, there is a positive constant 𝐶𝑇 such that sup(𝑡,𝑥)∈Ω×[0,𝑇 ] |∇ ⋅ 𝐟 (𝑥, 𝑡)| ≤ 𝐶𝑇 . C. Liu and X. Zhou: Preprint submitted to Elsevier Page 26 of 39 Generative Wasserstein Gradient Path Method

  62. [62]

    The kernel 𝑊 ∈ 𝐶 3(ℝ𝑑) and there exists a positive constant 𝐶𝑊 such that |∇𝑊 (𝑥)| ≤ 𝐶𝑊 for all 𝑥 ∈ Ω

  63. [63]

    • The score function is bounded: there exists 𝐶𝑠 > 0 such that sup𝑥∈Ω ‖∇ log 𝑝𝑡(𝑥)‖2 ≤ 𝐶𝑠

    (Regularity of Generated Density) The density 𝑝𝑡 induced by the flow Φ satisfies the following regularity conditions for 𝑡 ∈ [0, 𝑇 ]: • There exists a constant 𝐶 𝑓 𝑇 such that (𝐶 𝑓 𝑇 )−1 ≤ 𝑝𝑡(𝑥) ≤ 𝐶 𝑓 𝑇 for all 𝑥 ∈ Ω. • The score function is bounded: there exists 𝐶𝑠 > 0 such that sup𝑥∈Ω ‖∇ log 𝑝𝑡(𝑥)‖2 ≤ 𝐶𝑠. We first present a lemma that establishes the upper...

  64. [64]

    𝑡 ∈ (0, 𝑇 )

    For ∀𝑇 ≥ 0 and every path 𝑝 ∈ 𝐴𝐶𝜌𝑎,𝜌𝑏,𝑇 , the map 𝑡 ↦  (𝑝𝑡) is absolutely continuous on [0, 𝑇 ] and satisfies 𝑑 𝑑𝑡  (𝑝𝑡) = ⟨ ∇d  (𝑝𝑡), 𝜕𝑡𝑝𝑡 ⟩ −1,𝑝𝑡 for a.e. 𝑡 ∈ (0, 𝑇 ). C. Liu and X. Zhou: Preprint submitted to Elsevier Page 32 of 39 Generative Wasserstein Gradient Path Method Lemma 3. Under the assumptions 2 ,if 𝑆𝑇 [𝑝] < +∞, then ‖𝜕𝑡𝑝𝑡‖−1,𝑝𝑡 ∈ 𝐿2(0, ...