Generative Path-Finding Method for Wasserstein Gradient Flow

Chengyu Liu; Xiang Zhou

arxiv: 2604.11519 · v1 · submitted 2026-04-13 · 💻 cs.LG · math-ph· math.MP

Generative Path-Finding Method for Wasserstein Gradient Flow

Chengyu Liu , Xiang Zhou This is my paper

Pith reviewed 2026-05-10 15:27 UTC · model grok-4.3

classification 💻 cs.LG math-phmath.MP

keywords Wasserstein gradient flowgenerative path findingnormalizing flowslarge deviation theoryFokker-Planckaction functionalprobability transport

0 comments

The pith

A generative framework learns paths in Wasserstein space by minimizing an action loss from large deviation theory using normalizing flows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GenWGP, a method that computes the full trajectory of a probability distribution evolving under Wasserstein gradient flow from an initial state to equilibrium. It does this by training a generative model to minimize a path loss that comes from a geometric action functional based on Dawson-Gartner large deviation theory. The approach uses normalizing flows to create a curve where the intrinsic speed is constant between layers, keeping successive distributions roughly equidistant in the Wasserstein metric. This setup allows the method to work with coarse discretizations of about a dozen points while remaining stable and independent of specific time or geometry choices. A reader would care because it offers a practical way to simulate these flows in high dimensions without the usual numerical difficulties of time stepping or particle methods.

Core claim

GenWGP learns a generative flow that transports mass from an initial density to an unknown equilibrium distribution by minimizing a path loss derived from a geometric action functional motivated by Dawson-Gartner large deviation theory for empirical distributions of interacting diffusion systems. Both a finite horizon action under physical time parametrization and a reparameterization invariant geometric action based on Wasserstein arclength are formulated. Using normalizing flows, it computes a geometric curve toward equilibrium while enforcing approximately constant intrinsic speed between adjacent network layers, so that discretized distributions remain nearly equidistant in the Wasserste

What carries the argument

The geometric action functional based on Wasserstein arclength, minimized via normalizing flows to enforce constant intrinsic speed between layers in the generative path.

If this is right

GenWGP matches or exceeds high fidelity reference solutions with only about a dozen discretization points.
It enables stable training largely independent of temporal or geometric discretization.
The method applies to Fokker-Planck and aggregation type problems while capturing complex dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The constant intrinsic speed enforcement could be adapted to other generative architectures for learning continuous dynamics in distribution space.
This suggests large deviation principles may guide loss design for approximating gradient flows without fine time discretization.
If it generalizes, GenWGP could offer a scalable alternative to particle simulations for high-dimensional equilibrium sampling.

Load-bearing premise

The path loss from Dawson-Gartner large deviation theory accurately encodes the Wasserstein gradient flow dynamics and terminal condition, with normalizing flows sufficiently expressive for the transport maps.

What would settle it

A direct comparison on a low-dimensional problem with known analytic Wasserstein gradient flow solution, verifying if the 12-point discretization produces equidistant Wasserstein distances and correct terminal distribution.

Figures

Figures reproduced from arXiv: 2604.11519 by Chengyu Liu, Xiang Zhou.

**Figure 1.** Figure 1: Transport map learned by Algorithm 2 with 𝐾 = 9 layers for the 2D isotropic Gaussian case. The flow map (indicated by the arrows) transports the initial density toward equilibrium while preserving isotropic structure. 0 1 2 3 4 5 6 7 8 Layer 0.4760 0.4765 0.4770 0.4775 0.4780 0.4785 0.4790 0.4795 0.4800 Arc length (a) Arc-length (segment norm) between each pair of neighbouring layers 0 1 2 3 4 5 6 7 8 Lay… view at source ↗

**Figure 2.** Figure 2: Validation of the learned geometric path for the 2D isotropic Gaussian example. (a): nearly constant segment lengths indicate approximate arc-length parametrization; (b): cosine alignment close to one is consistent with the gradientflow direction; and (c) the density snapshots at two selected times agree well with the exact solution at the recovered physical times. C. Liu and X. Zhou: Preprint submitted t… view at source ↗

**Figure 3.** Figure 3: Comparison between the physicaltime formulation and the geometric formulation for the 2D isotropic Gaussian example. (a): the uniform time mesh for each layer in Algorithm 1 and the recovered physical time mesh in Algorithm 2; (b): the decay of free energy plot in physical time; (c) the decay of free energy plot in layers; (d) the d errors. 2D anisotropic diffusion. For 𝜇 = [3, 3]⊤ and Σ = diag(1, 0.25), … view at source ↗

**Figure 4.** Figure 4: Comparison in the anisotropic Gaussian case. This example examines whether the method remains accurate in a moderate-dimensional setting with coupled and anisotropic substructures. The learned flow from Algorithm 2 captures the expected rotated and anisotropic components in the selected two-dimensional projections; see [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: 2D projections of the terminal distribution for the 10D (dimension 0 to 9) block-structured Gaussian example. 5.1.2. Non-Convex Potential: 10D Styblinski-Tang Potential We next consider the 10D Styblinski-Tang potential, given by a sum of identical one-dimensional potentials over each coordinate: 𝑉 (𝑥) = 3 50 (∑ 𝑑 𝑖=1 𝑥 4 𝑖 − 16𝑥 2 𝑖 + 5𝑥𝑖 ) , 𝑥 = (𝑥1 ,…, 𝑥10) ∈ ℝ 10 . The initial is the standard Gaussian … view at source ↗

**Figure 6.** Figure 6: Absolute error of the mean and Frobenius norm of the covariance error vs. recovered time in the 10D blockstructured Gaussian example. Parameterization and visualization. We parameterize the path by a non-uniform B-spline Flow [24] with two hidden layers (width 100, SiLU activation), which provides smooth 𝐶 2 -diffeomorphic transports with controlled regularity [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Sample points projected onto the (𝑥5 , 𝑥6 )-plane at each layer along the learned geometric path for the 10D Styblinski-Tang potential. Reference solution and comparison. Each one-dimensional marginal evolves independently according to the onedimensional Fokker-Planck equation, equivalently the over-damped Langevin SDE d𝑋𝑡 = −𝑉 ′ 1 (𝑋𝑡 )d𝑡 + √ 2d𝐵𝑡 , 𝑉1 (𝑋) = 3 50 ( 𝑋 4 − 16𝑋 2 + 5𝑋 ) , 𝑋𝑡 ∈ ℝ. Unlike the… view at source ↗

**Figure 8.** Figure 8: Comparison of marginal densities from the learned path (colored for each component) and from 1D SDE simulation (black) at each layer, with the recovered physical times indicated. 5.2. Interacting Particle Dynamics We next consider WGFs driven by nonlocal interaction energies with the pairwise term 1 2 ∫ℝ𝑑×ℝ𝑑 𝑊 (𝑥 − 𝑦)𝜌(𝑥)𝜌(𝑦)d𝑥d𝑦. Such models arise in aggregation, swarming, and mean-field dynamics. In cont… view at source ↗

**Figure 9.** Figure 9: Pure aggregation: particle transport (top) and density evolution (bottom) along the learned geometric path. The terminal state approaches the uniform distribution supported on the unit disk. 0 1 2 3 4 5 6 7 Layer 0.00 0.02 0.04 0.06 0.08 R elativ e L2 Error [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Relative 𝐿2 difference between the learned densities and the reference solution computed by the primal-dual method. The explicit steady state [5] is the uniform distribution on the annulus with inner and outer radii 𝑅𝑖 = √𝛼1 𝛼2 , 𝑅𝑜 = √ 𝑅2 𝑖 + 1. Starting from a non-radially symmetric initial datum composed of five Gaussians, the learned geometric path restores radial symmetry and converges to the annular… view at source ↗

**Figure 11.** Figure 11: Aggregation-drift: particle evolution from an asymmetric initial distribution toward the annular steady state. We use this example to show how the geometric path obtained by Algorithm 2 can provide a better non-uniform time mesh as well as a good terminal time for the physical-time path minimization algorithm 1, as discussed in Remark 5. More precisely, we first compute a geometric path, then recover the … view at source ↗

**Figure 12.** Figure 12: Comparison between two physical-time discretizations for the aggregation-drift equation: the standard uniformtime mesh and the recovered time mesh obtained from the geometric path. The recovered time allocates more layers to the fast transient regime and produces a smaller cumulative MAM loss. To further examine the structure of the two learned paths, we report in [PITH_FULL_IMAGE:figures/full_fig_p022_… view at source ↗

**Figure 13.** Figure 13: Structural diagnostics for the aggregation-drift equation under uniform time and recovered time discretizations. From left to right: free energy, cosine alignment with −∇d , and the Wasserstein distance between neighboring layers. 5.2.3. Aggregation-Diffusion Equation We conclude this section with an aggregation-diffusion model that combines nonlinear diffusion and nonlocal attraction: (𝜌) = ∫ℝ2 𝜈 𝑚 − … view at source ↗

**Figure 14.** Figure 14: Aggregation-diffusion: density snapshots along the learned geometric path. Multi-bump transient states gradually merge into a smooth radial steady state. 0 5 10 15 20 Physical Time t 4.0 3.5 3.0 2.5 2.0 1.5 1.0 Energy 0 1 2 3 4 5 6 7 8 9 10 11 Layer 4.0 3.5 3.0 2.5 2.0 1.5 1.0 Energy [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: Free-energy profiles along the gradient flow path. Left: the evolution computed by a conventional primal-dual scheme with a constant time step size 0.5. Right: evolution along the learned geometric path with only 11 images. time within any specified finite time horizon, and the other is the parametrizationfree geometric formulation. The physicaltime formulation provides a horizonbased variational descript… view at source ↗

read the original abstract

Wasserstein gradient flows (WGFs) describe the evolution of probability distributions in Wasserstein space as steepest descent dynamics for a free energy functional. Computing the full path from an arbitrary initial distribution to equilibrium is challenging, especially in high dimensions. Eulerian methods suffer from the curse of dimensionality, while existing Lagrangian approaches based on particles or generative maps do not naturally improve efficiency through time step tuning. We propose GenWGP, a generative path finding framework for Wasserstein gradient paths. GenWGP learns a generative flow that transports mass from an initial density to an unknown equilibrium distribution by minimizing a path loss that encodes the full trajectory and its terminal equilibrium condition. The loss is derived from a geometric action functional motivated by Dawson Gartner large deviation theory for empirical distributions of interacting diffusion systems. We formulate both a finite horizon action under physical time parametrization and a reparameterization invariant geometric action based on Wasserstein arclength. Using normalizing flows, GenWGP computes a geometric curve toward equilibrium while enforcing approximately constant intrinsic speed between adjacent network layers, so that discretized distributions remain nearly equidistant in the Wasserstein metric along the path. This avoids delicate time stepping constraints and enables stable training that is largely independent of temporal or geometric discretization. Experiments on Fokker Planck and aggregation type problems show that GenWGP matches or exceeds high fidelity reference solutions with only about a dozen discretization points while capturing complex dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GenWGP combines normalizing flows with a geometric action loss from large-deviation theory to trace full Wasserstein gradient flow paths using coarse discretization, but the exactness of the continuum recovery remains unverified.

read the letter

The main point is that this paper gives a generative way to compute an entire Wasserstein gradient flow trajectory, not just the endpoint. They stack normalizing flow layers and minimize a path loss that comes from the Dawson-Gartner large-deviation rate functional for interacting diffusions. The loss has both a finite-horizon version and a reparameterization-invariant geometric version based on Wasserstein arclength, so the network learns to keep roughly constant speed between layers and avoid the usual time-step tuning headaches.

Referee Report

2 major / 2 minor

Summary. The paper proposes GenWGP, a generative framework that learns normalizing-flow compositions to transport an initial density to equilibrium along a Wasserstein gradient flow path. It derives a path loss from the Dawson-Gartner large-deviation principle for empirical measures of interacting diffusions, formulates both a finite-horizon action and a reparameterization-invariant geometric action based on Wasserstein arclength, and enforces approximately constant intrinsic speed between layers so that discrete distributions remain nearly equidistant in the Wasserstein metric. Experiments on Fokker-Planck and aggregation problems report that the method recovers reference solutions using only about a dozen discretization points.

Significance. If the loss functional is shown to recover the exact WGF trajectory (continuity equation with velocity equal to the Wasserstein gradient of the free energy plus terminal equilibrium), the approach would provide a discretization-robust Lagrangian method for computing full high-dimensional paths, sidestepping the time-stepping sensitivities of existing particle or map-based schemes and exploiting the representational power of normalizing flows.

major comments (2)

[Abstract / derivation of the loss] Abstract and derivation section: the claim that the path loss 'encodes the full trajectory and its terminal equilibrium condition' rests on the Dawson-Gartner LDP for empirical distributions of interacting diffusions, yet the manuscript provides no rigorous argument that the resulting variational problem is equivalent to the Wasserstein gradient flow in the continuum limit; in particular, fluctuation corrections and the non-commutativity of the mean-field and large-deviation limits are not controlled, which is load-bearing for the assertion that the learned curve satisfies the continuity equation with the correct velocity field.
[Experiments] Experimental section: the statement that GenWGP 'matches or exceeds high fidelity reference solutions with only about a dozen discretization points' is presented without error bars, statistical significance tests, or systematic ablations on the number of layers/points; this weakens the supporting evidence for the claim of robustness to temporal and geometric discretization.

minor comments (2)

[Method] The geometric action is described as 'reparameterization invariant' and 'based on Wasserstein arclength,' but the precise definition of the discrete arclength metric and how it is computed from the flow layers is not stated explicitly enough for reproduction.
[Preliminaries] Notation for the free-energy functional and the interaction kernel in the aggregation examples is introduced without a dedicated table or appendix listing all symbols, making cross-referencing with the loss terms cumbersome.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We address each of the major comments point by point below.

read point-by-point responses

Referee: [Abstract / derivation of the loss] Abstract and derivation section: the claim that the path loss 'encodes the full trajectory and its terminal equilibrium condition' rests on the Dawson-Gartner LDP for empirical distributions of interacting diffusions, yet the manuscript provides no rigorous argument that the resulting variational problem is equivalent to the Wasserstein gradient flow in the continuum limit; in particular, fluctuation corrections and the non-commutativity of the mean-field and large-deviation limits are not controlled, which is load-bearing for the assertion that the learned curve satisfies the continuity equation with the correct velocity field.

Authors: The path loss is constructed directly from the Dawson-Gartner large-deviation rate functional for the empirical measure of interacting diffusions; its minimizer is the most probable path, which coincides with the Wasserstein gradient flow of the free energy in the mean-field scaling. The manuscript derives both the finite-horizon and geometric-action forms from this principle and shows that the resulting discrete trajectories satisfy the continuity equation with the correct velocity by construction. We acknowledge that a fully rigorous control of fluctuation corrections and the precise interchange of mean-field and large-deviation limits lies beyond the scope of the present work. We will revise the derivation section to state the limiting regime more precisely and to cite the relevant literature on the convergence of the LDP to the WGF, thereby clarifying the scope of the claim without altering the core argument. revision: partial
Referee: [Experiments] Experimental section: the statement that GenWGP 'matches or exceeds high fidelity reference solutions with only about a dozen discretization points' is presented without error bars, statistical significance tests, or systematic ablations on the number of layers/points; this weakens the supporting evidence for the claim of robustness to temporal and geometric discretization.

Authors: We agree that the experimental evidence would be more convincing with additional statistical support. In the revised manuscript we will include error bars computed over multiple independent training runs with different random seeds, report p-values or other statistical comparisons against the reference solutions where appropriate, and add systematic ablations that vary the number of layers (discretization points) while keeping all other hyperparameters fixed. These additions will directly address the concern about robustness to discretization. revision: yes

Circularity Check

0 steps flagged

No circularity; path loss derived from external Dawson-Gartner LDP, not from network outputs or self-referential definitions

full rationale

The paper's central construction begins with a path loss motivated by the Dawson-Gartner large-deviation principle for empirical measures of interacting diffusions—an external, pre-existing mathematical result independent of the normalizing-flow architecture or its parameters. This loss is then minimized over compositions of flow maps to obtain the generative path; the minimization itself is a standard variational procedure and does not redefine the loss in terms of the fitted outputs. No equation equates a derived quantity to a fitted parameter by construction, no uniqueness theorem is imported from the authors' prior work, and no ansatz is smuggled via self-citation. The geometric reparameterization and constant-speed enforcement follow directly from the Wasserstein arclength definition once the loss is accepted, without circular reduction. The derivation chain therefore remains self-contained against the external benchmark of large-deviation theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the geometric action functional derived from Dawson-Gartner large deviation theory and on the representational power of normalizing flows for the transport maps. No explicit free parameters or new invented entities are stated in the abstract.

axioms (1)

domain assumption The path loss encodes the full trajectory and its terminal equilibrium condition via the geometric action functional motivated by Dawson-Gartner large deviation theory for empirical distributions of interacting diffusion systems.
This is the foundational motivation for the training objective in both finite-horizon and reparameterization-invariant forms.

pith-pipeline@v0.9.0 · 5548 in / 1409 out tokens · 43520 ms · 2026-05-10T15:27:30.610025+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

Adams, S., Dirr, N., Peletier, M., and Zimmer, J. (2013). Large deviations and gradient ﬂows. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , 371(2005):20120341

work page 2013
[2]

Ambrosio, L., Gigli, N., and Savaré, G. (2008). Gradient ﬂows: in metric spaces and in the space of probability measures . Springer Science & Business Media

work page 2008
[3]

Boﬃ, N. M. and Vanden-Eijnden, E. (2023). Probability ﬂow solution of the Fokker–Planck equation. Machine Learning: Science and Technology, 4(3):035012

work page 2023
[4]

Boltzmann, L. (1872). Weitere studien über das wärmegleichgewicht unter gasmolekülen, volume 66. Aus der kk Hot-und Staatsdruckerei

work page
[5]

Byun, S.-S. (2024). Planar equilibrium measure problem in the quadratic ﬁelds with a point charge. Computational Methods and Function Theory, 24(2):303–332

work page 2024
[6]

Cai, Z., Cao, Y., Huang, Y., and Zhou, X. (2026). Weak generative sampler to eﬃciently sample invariant distribution of stochastic diﬀerential equation. SIAM Journal on Scientiﬁc Computing (to appear. arXiv2405.19256 )

work page arXiv 2026
[7]

Cai, Z., Liu, C., and Zhou, X. (2025). Weak generative sampler for stationary distributions of mckean-vlasov system. arXiv preprint arXiv:2509.12841

work page arXiv 2025
[8]

A., Chertock, A., and Huang, Y

Carrillo, J. A., Chertock, A., and Huang, Y. (2015). A ﬁnite-volume method for nonlinear nonlocal equations with a gradient ﬂow structure. Communications in Computational Physics , 17(1):233–258

work page 2015
[9]

A., Craig, K., Wang, L., and Wei, C

Carrillo, J. A., Craig, K., Wang, L., and Wei, C. (2022). Primal dual methods for wasserstein gradient ﬂows. Foundations of Computational Mathematics, pages 1–55

work page 2022
[10]

A., McCann, R

Carrillo, J. A., McCann, R. J., and Villani, C. (2003). Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Revista Matematica Iberoamericana, 19(3):971–1018

work page 2003
[11]

T., Rubanova, Y., Bettencourt, J., and Duvenaud, D

Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. (2018). Neural ordinary diﬀerential equations.Advances in neural information processing systems, 31

work page 2018
[12]

Dawson, D. A. (1983). Critical dynamics and ﬂuctuations for a mean-ﬁeld model of cooperative behavior. Journal of Statistical Physics , 31(1):29–85

work page 1983
[13]

Dawson, D. A. and Gärtner, J. (1987). Large deviations from the McKean-Vlasov limit for weakly interacting diﬀusions. Stochastics, 20(4):247–308

work page 1987
[14]

Dawson, D. A. and Gärtner, J. (1989). Large deviations, free energy functional and quasi-potential for a mean ﬁeld model of interacting diﬀusions, volume 78. Memoirs of the American Mathematical Society

work page 1989
[15]

Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016). Density estimation using Real NVP. In International Conference on Learning Representations

work page 2016
[16]

Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. (2019). Neural spline ﬂows. Advances in neural information processing systems , 32

work page 2019
[17]

E, W., Ren, W., and Vanden-Eijnden, E. (2002). String method for the study of rare events. Phys. Rev. B, 66:052301

work page 2002
[18]

E, W., Ren, W., and Vanden-Eijnden, E. (2004). Minimum action method for the study of rare events. Comm. Pure Appl. Math., 57:637–656

work page 2004
[19]

and Kurtz, T

Feng, J. and Kurtz, T. G. (2006). Large Deviations for Stochastic Processes, volume 131 of Mathematical Surveys and Monographs. American Mathematical Society, Prividence, RI

work page 2006
[20]

Freidlin, M. I. and Wentzell, A. D. (2012). Random Perturbations of Dynamical Systems . Grundlehren der mathematischen Wissenschaften. Springer-Verlag, New York, 3 edition

work page 2012
[21]

Han, J., Wu, Z., Gu, S., and Zhou, X. (2026). StringNET: Neural Network based Variational Method for Transition Pathways.Communications in Computational Physics

work page 2026
[22]

and Vanden-Eijnden, E

Heymann, M. and Vanden-Eijnden, E. (2008a). The geometric minimum action method: A least action principle on the space of curves. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences , 61(8):1052–1117

work page
[23]

and Vanden-Eijnden, E

Heymann, M. and Vanden-Eijnden, E. (2008b). The geometric minimum action method: a least action principle on the space of curves. Comm. Pure Appl. Math., 61:1052–1117

work page
[24]

and Chun, S

Hong, S. and Chun, S. Y. (2023). Neural diﬀeomorphic non-uniform B-spline ﬂows. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 37, pages 12225–12233

work page 2023
[25]

Hu, Z., Liu, C., Wang, Y., and Xu, Z. (2024). Energetic variational neural network discretizations of gradient ﬂows.SIAM Journal on Scientiﬁc Computing, 46(4):A2528–A2556

work page 2024
[26]

Huang, H., Yu, J., Chen, J., and Lai, R. (2023). Bridging mean-ﬁeld games and normalizing ﬂows with trajectory regularization. Journal of Computational Physics, 487:112155

work page 2023
[27]

Huang, Y., Liu, C., and Zhou, X. (2026). Levy Score Function and Score-Based Particle Algorithm for Nonlinear Levy–Fokker–Planck Equations. SIAM Journal on Numerical Analysis (to appear), arXiv 2412.19520

work page arXiv 2026
[28]

and Dayan, P

Hyvärinen, A. and Dayan, P. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4)

work page 2005
[29]

Jordan, R., Kinderlehrer, D., and Otto, F. (1998). The variational formulation of the Fokker–Planck equation. SIAM journal on mathematical analysis, 29(1):1–17

work page 1998
[30]

J., and Brubaker, M

Kobyzev, I., Prince, S. J., and Brubaker, M. A. (2020). Normalizing ﬂows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence , 43(11):3964–3979. C. Liu and X. Zhou: Preprint submitted to Elsevier Page 25 of 39 Generative Wasserstein Gradient Path Method

work page 2020
[31]

Laﬀerty, J. D. (1988). The density manifold and conﬁguration space quantization. Transactions of the American Mathematical Society , 305(2):699–741

work page 1988
[32]

Lee, W., Wang, L., and Li, W. (2024). Deep JKO: time-implicit particle methods for general nonlinear gradient ﬂows.Journal of Computational Physics, page 113187

work page 2024
[33]

Li, L., Hurault, S., and Solomon, J. (2023). Self-consistent velocity matching of probability ﬂows. In Thirty-seventh Conference on Neural Information Processing Systems

work page 2023
[34]

Liu, S., Li, W., Zha, H., and Zhou, H. (2022). Neural parametric fokker–planck equation. SIAM Journal on Numerical Analysis , 60(3):1385– 1449

work page 2022
[35]

Lu, J., Wu, Y., and Xiang, Y. (2024). Score-based transport modeling for mean-ﬁeld Fokker-Planck equations. Journal of Computational Physics, 503:112859

work page 2024
[36]

Nurbekyan, L., Lei, W., and Yang, Y. (2023). Eﬃcient natural gradient descent methods for large-scale PDE-based optimization problems. SIAM Journal on Scientiﬁc Computing , 45(4):A1621–A1655

work page 2023
[37]

Onsager, L. (1931). Reciprocal relations in irreversible processes. i. Physical review, 37(4):405

work page 1931
[38]

Otto, F. (2001). The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Diﬀerential Equations , 26:101–174

work page 2001
[39]

J., Mohamed, S., and Lakshminarayanan, B

Papamakarios, G., Nalisnick, E., Rezende, D. J., Mohamed, S., and Lakshminarayanan, B. (2021). Normalizing ﬂows for probabilistic modeling and inference. Journal of Machine Learning Research , 22(57):1–64

work page 2021
[40]

Peletier, M. A. (2014). Variational modelling: Energies, gradient ﬂows, and large deviations. arXiv preprint arXiv:1402.1990

work page arXiv 2014
[41]

Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial diﬀerential equations. Journal of Computational physics , 378:686–707

work page 2019
[42]

and Mohamed, S

Rezende, D. and Mohamed, S. (2015). Variational inference with normalizing ﬂows. In International conference on machine learning, pages 1530–1538. PMLR

work page 2015
[43]

Rousset, M., Stoltz, G., and Lelievre, T. (2010). Free energy computations: a mathematical perspective . World Scientiﬁc

work page 2010
[44]

and Wang, Z

Shen, Z. and Wang, Z. (2024). Entropy-dissipation informed neural network for Mckean-Vlasov type PDEs. Advances in Neural Information Processing Systems, 36

work page 2024
[45]

Simonnet, E. (2023). Computing non-equilibrium trajectories by a deep learning approach. Journal of Computational Physics , 491:112349

work page 2023
[46]

P., Kumar, A., Ermon, S., and Poole, B

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. (2021). Score-based generative modeling through stochastic diﬀerential equations. In International Conference on Learning Representations

work page 2021
[47]

and Zhou, X

Sun, Y. and Zhou, X. (2018). An improved adaptive minimum action method for the calculation of transition path in non-gradient systems. Communications in Computational Physics , 24(1):44–68

work page 2018
[48]

Tang, K., Wan, X., and Liao, Q. (2022). Adaptive deep density approximation for Fokker-Planck equations.Journal of Computational Physics, 457:111080

work page 2022
[49]

and Heymann, M

Vanden-Eijnden, E. and Heymann, M. (2008). The geometric minimum action method for computing minimum energy paths. J. Chem. Phys., 128:061103

work page 2008
[50]

Vázquez, J. L. (2007). The porous medium equation: mathematical theory . Oxford university press

work page 2007
[51]

Villani, C. (2009). Optimal transport: old and new, volume 338 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin

work page 2009
[52]

Villani, C. (2021). Topics in optimal transportation, volume 58. American Mathematical Soc

work page 2021
[53]

Wan, X. (2015). A minimum action method with optimal linear time scaling. Communications in Computational Physics , 18(5):1352–1379

work page 2015
[54]

Xie, H., Li, Z.-H., Wang, H., Zhang, L., and Wang, L. (2023). Deep variational free energy approach to dense hydrogen. Physical Review Letters, 131(12):126501

work page 2023
[55]

Xu, C., Cheng, X., and Xie, Y. (2023). Normalizing ﬂow neural networks by JKO scheme. InThirty-seventh Conference on Neural Information Processing Systems

work page 2023
[56]

Yu, B. et al. (2018). The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12

work page 2018
[57]

Zhou, X., Ren, W., and E, W. (2008). Adaptive minimum action method for the study of rare events. J. Chem. Phys., 128(10):104111. A. Proofs in Section 3 A.1. Proof of Theorem 1 Assumption 1

work page 2008
[58]

The domain Ω is a bounded domain of ﬁnite measure (in particular, 𝕋 𝑑), and the boundary conditions are periodic or no-ﬂux

work page
[59]

The initial distribution 𝜌0 is absolutely continuous with respect to the Lebesgue measure (we still denote its density as 𝜌0) and there exists a positive constant 𝐶0 such that (𝐶0)−1 ≤ 𝜌0(𝑥) ≤ 𝐶0 for all 𝑥 ∈ Ω , and 𝜌0 ∈ 𝐶 2(Ω)

work page
[60]

For every 𝑇 ≥ 0, the solution ̂ 𝑝𝑡 ∈ 𝐶 3(Ω) and there is a positive constant 𝐶 ∗ 𝑇 such that (𝐶 ∗ 𝑇 )−1 ≤ ̂ 𝑝𝑡(𝑥) ≤ 𝐶 ∗ 𝑇 for all 𝑥 ∈ Ω, 𝑡 ∈ [0, 𝑇 ]

work page
[61]

For every 𝑇 ≥ 0, there is a positive constant 𝐶𝑇 such that sup(𝑡,𝑥)∈Ω×[0,𝑇 ] |∇ ⋅ 𝐟 (𝑥, 𝑡)| ≤ 𝐶𝑇

The velocity ﬁeld 𝐟𝑡(𝑥) = 𝐟 (𝑥, 𝑡) ∈ 𝐶 2,1(Ω × ℝ, ℝ𝑑). For every 𝑇 ≥ 0, there is a positive constant 𝐶𝑇 such that sup(𝑡,𝑥)∈Ω×[0,𝑇 ] |∇ ⋅ 𝐟 (𝑥, 𝑡)| ≤ 𝐶𝑇 . C. Liu and X. Zhou: Preprint submitted to Elsevier Page 26 of 39 Generative Wasserstein Gradient Path Method

work page
[62]

The kernel 𝑊 ∈ 𝐶 3(ℝ𝑑) and there exists a positive constant 𝐶𝑊 such that |∇𝑊 (𝑥)| ≤ 𝐶𝑊 for all 𝑥 ∈ Ω

work page
[63]

• The score function is bounded: there exists 𝐶𝑠 > 0 such that sup𝑥∈Ω ‖∇ log 𝑝𝑡(𝑥)‖2 ≤ 𝐶𝑠

(Regularity of Generated Density) The density 𝑝𝑡 induced by the ﬂow Φ satisﬁes the following regularity conditions for 𝑡 ∈ [0, 𝑇 ]: • There exists a constant 𝐶 𝑓 𝑇 such that (𝐶 𝑓 𝑇 )−1 ≤ 𝑝𝑡(𝑥) ≤ 𝐶 𝑓 𝑇 for all 𝑥 ∈ Ω. • The score function is bounded: there exists 𝐶𝑠 > 0 such that sup𝑥∈Ω ‖∇ log 𝑝𝑡(𝑥)‖2 ≤ 𝐶𝑠. We ﬁrst present a lemma that establishes the upper...

work page
[64]

𝑡 ∈ (0, 𝑇 )

For ∀𝑇 ≥ 0 and every path 𝑝 ∈ 𝐴𝐶𝜌𝑎,𝜌𝑏,𝑇 , the map 𝑡 ↦  (𝑝𝑡) is absolutely continuous on [0, 𝑇 ] and satisﬁes 𝑑 𝑑𝑡  (𝑝𝑡) = ⟨ ∇d  (𝑝𝑡), 𝜕𝑡𝑝𝑡 ⟩ −1,𝑝𝑡 for a.e. 𝑡 ∈ (0, 𝑇 ). C. Liu and X. Zhou: Preprint submitted to Elsevier Page 32 of 39 Generative Wasserstein Gradient Path Method Lemma 3. Under the assumptions 2 ,if 𝑆𝑇 [𝑝] < +∞, then ‖𝜕𝑡𝑝𝑡‖−1,𝑝𝑡 ∈ 𝐿2(0, ...

work page

[1] [1]

Adams, S., Dirr, N., Peletier, M., and Zimmer, J. (2013). Large deviations and gradient ﬂows. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , 371(2005):20120341

work page 2013

[2] [2]

Ambrosio, L., Gigli, N., and Savaré, G. (2008). Gradient ﬂows: in metric spaces and in the space of probability measures . Springer Science & Business Media

work page 2008

[3] [3]

Boﬃ, N. M. and Vanden-Eijnden, E. (2023). Probability ﬂow solution of the Fokker–Planck equation. Machine Learning: Science and Technology, 4(3):035012

work page 2023

[4] [4]

Boltzmann, L. (1872). Weitere studien über das wärmegleichgewicht unter gasmolekülen, volume 66. Aus der kk Hot-und Staatsdruckerei

work page

[5] [5]

Byun, S.-S. (2024). Planar equilibrium measure problem in the quadratic ﬁelds with a point charge. Computational Methods and Function Theory, 24(2):303–332

work page 2024

[6] [6]

Cai, Z., Cao, Y., Huang, Y., and Zhou, X. (2026). Weak generative sampler to eﬃciently sample invariant distribution of stochastic diﬀerential equation. SIAM Journal on Scientiﬁc Computing (to appear. arXiv2405.19256 )

work page arXiv 2026

[7] [7]

Cai, Z., Liu, C., and Zhou, X. (2025). Weak generative sampler for stationary distributions of mckean-vlasov system. arXiv preprint arXiv:2509.12841

work page arXiv 2025

[8] [8]

A., Chertock, A., and Huang, Y

Carrillo, J. A., Chertock, A., and Huang, Y. (2015). A ﬁnite-volume method for nonlinear nonlocal equations with a gradient ﬂow structure. Communications in Computational Physics , 17(1):233–258

work page 2015

[9] [9]

A., Craig, K., Wang, L., and Wei, C

Carrillo, J. A., Craig, K., Wang, L., and Wei, C. (2022). Primal dual methods for wasserstein gradient ﬂows. Foundations of Computational Mathematics, pages 1–55

work page 2022

[10] [10]

A., McCann, R

Carrillo, J. A., McCann, R. J., and Villani, C. (2003). Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Revista Matematica Iberoamericana, 19(3):971–1018

work page 2003

[11] [11]

T., Rubanova, Y., Bettencourt, J., and Duvenaud, D

Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. (2018). Neural ordinary diﬀerential equations.Advances in neural information processing systems, 31

work page 2018

[12] [12]

Dawson, D. A. (1983). Critical dynamics and ﬂuctuations for a mean-ﬁeld model of cooperative behavior. Journal of Statistical Physics , 31(1):29–85

work page 1983

[13] [13]

Dawson, D. A. and Gärtner, J. (1987). Large deviations from the McKean-Vlasov limit for weakly interacting diﬀusions. Stochastics, 20(4):247–308

work page 1987

[14] [14]

Dawson, D. A. and Gärtner, J. (1989). Large deviations, free energy functional and quasi-potential for a mean ﬁeld model of interacting diﬀusions, volume 78. Memoirs of the American Mathematical Society

work page 1989

[15] [15]

Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016). Density estimation using Real NVP. In International Conference on Learning Representations

work page 2016

[16] [16]

Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. (2019). Neural spline ﬂows. Advances in neural information processing systems , 32

work page 2019

[17] [17]

E, W., Ren, W., and Vanden-Eijnden, E. (2002). String method for the study of rare events. Phys. Rev. B, 66:052301

work page 2002

[18] [18]

E, W., Ren, W., and Vanden-Eijnden, E. (2004). Minimum action method for the study of rare events. Comm. Pure Appl. Math., 57:637–656

work page 2004

[19] [19]

and Kurtz, T

Feng, J. and Kurtz, T. G. (2006). Large Deviations for Stochastic Processes, volume 131 of Mathematical Surveys and Monographs. American Mathematical Society, Prividence, RI

work page 2006

[20] [20]

Freidlin, M. I. and Wentzell, A. D. (2012). Random Perturbations of Dynamical Systems . Grundlehren der mathematischen Wissenschaften. Springer-Verlag, New York, 3 edition

work page 2012

[21] [21]

Han, J., Wu, Z., Gu, S., and Zhou, X. (2026). StringNET: Neural Network based Variational Method for Transition Pathways.Communications in Computational Physics

work page 2026

[22] [22]

and Vanden-Eijnden, E

Heymann, M. and Vanden-Eijnden, E. (2008a). The geometric minimum action method: A least action principle on the space of curves. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences , 61(8):1052–1117

work page

[23] [23]

and Vanden-Eijnden, E

Heymann, M. and Vanden-Eijnden, E. (2008b). The geometric minimum action method: a least action principle on the space of curves. Comm. Pure Appl. Math., 61:1052–1117

work page

[24] [24]

and Chun, S

Hong, S. and Chun, S. Y. (2023). Neural diﬀeomorphic non-uniform B-spline ﬂows. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 37, pages 12225–12233

work page 2023

[25] [25]

Hu, Z., Liu, C., Wang, Y., and Xu, Z. (2024). Energetic variational neural network discretizations of gradient ﬂows.SIAM Journal on Scientiﬁc Computing, 46(4):A2528–A2556

work page 2024

[26] [26]

Huang, H., Yu, J., Chen, J., and Lai, R. (2023). Bridging mean-ﬁeld games and normalizing ﬂows with trajectory regularization. Journal of Computational Physics, 487:112155

work page 2023

[27] [27]

Huang, Y., Liu, C., and Zhou, X. (2026). Levy Score Function and Score-Based Particle Algorithm for Nonlinear Levy–Fokker–Planck Equations. SIAM Journal on Numerical Analysis (to appear), arXiv 2412.19520

work page arXiv 2026

[28] [28]

and Dayan, P

Hyvärinen, A. and Dayan, P. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4)

work page 2005

[29] [29]

Jordan, R., Kinderlehrer, D., and Otto, F. (1998). The variational formulation of the Fokker–Planck equation. SIAM journal on mathematical analysis, 29(1):1–17

work page 1998

[30] [30]

J., and Brubaker, M

Kobyzev, I., Prince, S. J., and Brubaker, M. A. (2020). Normalizing ﬂows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence , 43(11):3964–3979. C. Liu and X. Zhou: Preprint submitted to Elsevier Page 25 of 39 Generative Wasserstein Gradient Path Method

work page 2020

[31] [31]

Laﬀerty, J. D. (1988). The density manifold and conﬁguration space quantization. Transactions of the American Mathematical Society , 305(2):699–741

work page 1988

[32] [32]

Lee, W., Wang, L., and Li, W. (2024). Deep JKO: time-implicit particle methods for general nonlinear gradient ﬂows.Journal of Computational Physics, page 113187

work page 2024

[33] [33]

Li, L., Hurault, S., and Solomon, J. (2023). Self-consistent velocity matching of probability ﬂows. In Thirty-seventh Conference on Neural Information Processing Systems

work page 2023

[34] [34]

Liu, S., Li, W., Zha, H., and Zhou, H. (2022). Neural parametric fokker–planck equation. SIAM Journal on Numerical Analysis , 60(3):1385– 1449

work page 2022

[35] [35]

Lu, J., Wu, Y., and Xiang, Y. (2024). Score-based transport modeling for mean-ﬁeld Fokker-Planck equations. Journal of Computational Physics, 503:112859

work page 2024

[36] [36]

Nurbekyan, L., Lei, W., and Yang, Y. (2023). Eﬃcient natural gradient descent methods for large-scale PDE-based optimization problems. SIAM Journal on Scientiﬁc Computing , 45(4):A1621–A1655

work page 2023

[37] [37]

Onsager, L. (1931). Reciprocal relations in irreversible processes. i. Physical review, 37(4):405

work page 1931

[38] [38]

Otto, F. (2001). The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Diﬀerential Equations , 26:101–174

work page 2001

[39] [39]

J., Mohamed, S., and Lakshminarayanan, B

Papamakarios, G., Nalisnick, E., Rezende, D. J., Mohamed, S., and Lakshminarayanan, B. (2021). Normalizing ﬂows for probabilistic modeling and inference. Journal of Machine Learning Research , 22(57):1–64

work page 2021

[40] [40]

Peletier, M. A. (2014). Variational modelling: Energies, gradient ﬂows, and large deviations. arXiv preprint arXiv:1402.1990

work page arXiv 2014

[41] [41]

Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial diﬀerential equations. Journal of Computational physics , 378:686–707

work page 2019

[42] [42]

and Mohamed, S

Rezende, D. and Mohamed, S. (2015). Variational inference with normalizing ﬂows. In International conference on machine learning, pages 1530–1538. PMLR

work page 2015

[43] [43]

Rousset, M., Stoltz, G., and Lelievre, T. (2010). Free energy computations: a mathematical perspective . World Scientiﬁc

work page 2010

[44] [44]

and Wang, Z

Shen, Z. and Wang, Z. (2024). Entropy-dissipation informed neural network for Mckean-Vlasov type PDEs. Advances in Neural Information Processing Systems, 36

work page 2024

[45] [45]

Simonnet, E. (2023). Computing non-equilibrium trajectories by a deep learning approach. Journal of Computational Physics , 491:112349

work page 2023

[46] [46]

P., Kumar, A., Ermon, S., and Poole, B

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. (2021). Score-based generative modeling through stochastic diﬀerential equations. In International Conference on Learning Representations

work page 2021

[47] [47]

and Zhou, X

Sun, Y. and Zhou, X. (2018). An improved adaptive minimum action method for the calculation of transition path in non-gradient systems. Communications in Computational Physics , 24(1):44–68

work page 2018

[48] [48]

Tang, K., Wan, X., and Liao, Q. (2022). Adaptive deep density approximation for Fokker-Planck equations.Journal of Computational Physics, 457:111080

work page 2022

[49] [49]

and Heymann, M

Vanden-Eijnden, E. and Heymann, M. (2008). The geometric minimum action method for computing minimum energy paths. J. Chem. Phys., 128:061103

work page 2008

[50] [50]

Vázquez, J. L. (2007). The porous medium equation: mathematical theory . Oxford university press

work page 2007

[51] [51]

Villani, C. (2009). Optimal transport: old and new, volume 338 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin

work page 2009

[52] [52]

Villani, C. (2021). Topics in optimal transportation, volume 58. American Mathematical Soc

work page 2021

[53] [53]

Wan, X. (2015). A minimum action method with optimal linear time scaling. Communications in Computational Physics , 18(5):1352–1379

work page 2015

[54] [54]

Xie, H., Li, Z.-H., Wang, H., Zhang, L., and Wang, L. (2023). Deep variational free energy approach to dense hydrogen. Physical Review Letters, 131(12):126501

work page 2023

[55] [55]

Xu, C., Cheng, X., and Xie, Y. (2023). Normalizing ﬂow neural networks by JKO scheme. InThirty-seventh Conference on Neural Information Processing Systems

work page 2023

[56] [56]

Yu, B. et al. (2018). The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12

work page 2018

[57] [57]

Zhou, X., Ren, W., and E, W. (2008). Adaptive minimum action method for the study of rare events. J. Chem. Phys., 128(10):104111. A. Proofs in Section 3 A.1. Proof of Theorem 1 Assumption 1

work page 2008

[58] [58]

The domain Ω is a bounded domain of ﬁnite measure (in particular, 𝕋 𝑑), and the boundary conditions are periodic or no-ﬂux

work page

[59] [59]

The initial distribution 𝜌0 is absolutely continuous with respect to the Lebesgue measure (we still denote its density as 𝜌0) and there exists a positive constant 𝐶0 such that (𝐶0)−1 ≤ 𝜌0(𝑥) ≤ 𝐶0 for all 𝑥 ∈ Ω , and 𝜌0 ∈ 𝐶 2(Ω)

work page

[60] [60]

For every 𝑇 ≥ 0, the solution ̂ 𝑝𝑡 ∈ 𝐶 3(Ω) and there is a positive constant 𝐶 ∗ 𝑇 such that (𝐶 ∗ 𝑇 )−1 ≤ ̂ 𝑝𝑡(𝑥) ≤ 𝐶 ∗ 𝑇 for all 𝑥 ∈ Ω, 𝑡 ∈ [0, 𝑇 ]

work page

[61] [61]

For every 𝑇 ≥ 0, there is a positive constant 𝐶𝑇 such that sup(𝑡,𝑥)∈Ω×[0,𝑇 ] |∇ ⋅ 𝐟 (𝑥, 𝑡)| ≤ 𝐶𝑇

The velocity ﬁeld 𝐟𝑡(𝑥) = 𝐟 (𝑥, 𝑡) ∈ 𝐶 2,1(Ω × ℝ, ℝ𝑑). For every 𝑇 ≥ 0, there is a positive constant 𝐶𝑇 such that sup(𝑡,𝑥)∈Ω×[0,𝑇 ] |∇ ⋅ 𝐟 (𝑥, 𝑡)| ≤ 𝐶𝑇 . C. Liu and X. Zhou: Preprint submitted to Elsevier Page 26 of 39 Generative Wasserstein Gradient Path Method

work page

[62] [62]

The kernel 𝑊 ∈ 𝐶 3(ℝ𝑑) and there exists a positive constant 𝐶𝑊 such that |∇𝑊 (𝑥)| ≤ 𝐶𝑊 for all 𝑥 ∈ Ω

work page

[63] [63]

• The score function is bounded: there exists 𝐶𝑠 > 0 such that sup𝑥∈Ω ‖∇ log 𝑝𝑡(𝑥)‖2 ≤ 𝐶𝑠

(Regularity of Generated Density) The density 𝑝𝑡 induced by the ﬂow Φ satisﬁes the following regularity conditions for 𝑡 ∈ [0, 𝑇 ]: • There exists a constant 𝐶 𝑓 𝑇 such that (𝐶 𝑓 𝑇 )−1 ≤ 𝑝𝑡(𝑥) ≤ 𝐶 𝑓 𝑇 for all 𝑥 ∈ Ω. • The score function is bounded: there exists 𝐶𝑠 > 0 such that sup𝑥∈Ω ‖∇ log 𝑝𝑡(𝑥)‖2 ≤ 𝐶𝑠. We ﬁrst present a lemma that establishes the upper...

work page

[64] [64]

𝑡 ∈ (0, 𝑇 )

For ∀𝑇 ≥ 0 and every path 𝑝 ∈ 𝐴𝐶𝜌𝑎,𝜌𝑏,𝑇 , the map 𝑡 ↦  (𝑝𝑡) is absolutely continuous on [0, 𝑇 ] and satisﬁes 𝑑 𝑑𝑡  (𝑝𝑡) = ⟨ ∇d  (𝑝𝑡), 𝜕𝑡𝑝𝑡 ⟩ −1,𝑝𝑡 for a.e. 𝑡 ∈ (0, 𝑇 ). C. Liu and X. Zhou: Preprint submitted to Elsevier Page 32 of 39 Generative Wasserstein Gradient Path Method Lemma 3. Under the assumptions 2 ,if 𝑆𝑇 [𝑝] < +∞, then ‖𝜕𝑡𝑝𝑡‖−1,𝑝𝑡 ∈ 𝐿2(0, ...

work page