Optimal and Diffusion Transports in Machine Learning
read the original abstract
Several problems in machine learning are naturally expressed as the design and analysis of time-evolving probability distributions. This includes sampling via diffusion methods, optimizing the weights of neural networks, and analyzing the evolution of token distributions across layers of large language models. While the targeted applications differ (samples, weights, tokens), their mathematical descriptions share a common structure. A key idea is to switch from the Eulerian representation of densities to their Lagrangian counterpart through vector fields that advect particles. This dual view introduces challenges, notably the non-uniqueness of Lagrangian vector fields, but also opportunities to craft density evolutions and flows with favorable properties in terms of regularity, stability, and computational tractability. This survey presents an overview of these methods, with emphasis on two complementary approaches: diffusion methods, which rely on stochastic interpolation processes and underpin modern generative AI, and optimal transport, which defines interpolation by minimizing displacement cost. We illustrate how both approaches appear in applications ranging from sampling, neural network optimization, to modeling the dynamics of transformers for large language models.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
The physics of AI weather models
AI weather models may simulate the atmosphere via particle positions in latent space whose updates follow gradient flow on a learned free energy functional rather than conventional physical equations.
-
Generative Modeling by Value-Driven Transport
A control-theoretic linear program yields value-driven transport policies for generative modeling with straight paths and simulation-free training.
-
On The Hidden Biases of Flow Matching Samplers
Empirical flow matching introduces coupled biases from plug-in estimation, including altered statistical targets, non-gradient minimizers, and non-unique dynamics via flux-null fields, with base distribution controlli...
-
Unbalanced Optimal Transport and Density Control for Discrete-Time Linear Systems
Unbalanced optimal transport and unbalanced density control for discrete-time linear systems with Gaussian references admit globally optimal convex formulations analogous to covariance steering.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.