Adversarial Computation of Optimal Transport Maps
Pith reviewed 2026-05-25 17:48 UTC · model grok-4.3
The pith
A GAN with a 2-Wasserstein discriminator makes its generator follow the Wasserstein geodesic and produce an optimal transport map at convergence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that during training, our generator follows the W_2-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training.
What carries the argument
The discriminator objective equal to the 2-Wasserstein metric, which enforces the generator to trace the geodesic in Wasserstein space.
Load-bearing premise
The adversarial training dynamics with a discriminator objective equal to the 2-Wasserstein metric will cause the generator to follow the geodesic.
What would settle it
An experiment or calculation showing that the sequence of intermediate generator distributions deviates from the W2 geodesic under the proposed training procedure.
Figures
read the original abstract
Computing optimal transport maps between high-dimensional and continuous distributions is a challenging problem in optimal transport (OT). Generative adversarial networks (GANs) are powerful generative models which have been successfully applied to learn maps across high-dimensional domains. However, little is known about the nature of the map learned with a GAN objective. To address this problem, we propose a generative adversarial model in which the discriminator's objective is the $2$-Wasserstein metric. We show that during training, our generator follows the $W_2$-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training. We validate our approach empirically in both low-dimensional and high-dimensional continuous settings, and show that it outperforms prior methods on image data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a generative adversarial model in which the discriminator objective is set to the 2-Wasserstein metric. It asserts that the resulting training dynamics cause the generator to follow the W_2 geodesic between the initial and target distributions, thereby reproducing an optimal transport map at convergence. Empirical validation is reported on low-dimensional continuous distributions and high-dimensional image data, with claimed outperformance over prior methods.
Significance. If the geodesic-flow claim can be placed on a rigorous footing, the work would establish a direct link between a specific adversarial objective and optimal transport geometry, providing a principled route to high-dimensional OT maps that avoids explicit linear programming or entropic regularization. The reported empirical gains on image data would then constitute evidence of practical utility beyond existing GAN-based transport estimators.
major comments (2)
- [Abstract] Abstract: the central claim that 'our generator follows the W_2-geodesic' is stated without any derivation, without the continuous-time ODE that the discrete updates are asserted to approximate, and without the conditions (exact discriminator optimality at every generator step, sufficient network capacity, vanishing step-size) required for the optimality guarantee to hold. This assertion is load-bearing for the paper's main conclusion.
- [Theoretical development] Theoretical development (wherever the geodesic property is asserted): no argument is supplied showing that the generator parameter trajectory in distribution space coincides with the W_2 geodesic when the discriminator objective equals W_2; in particular, the manuscript does not address the gap between finite-capacity discriminators, finite step sizes, and the exact Wasserstein gradient flow needed for the 'reproduces an optimal map' statement.
minor comments (1)
- The abstract states that the method 'outperforms prior methods on image data' but supplies no quantitative table, no description of the baselines, and no protocol for measuring map quality (e.g., how the learned map is evaluated against a ground-truth transport).
Simulated Author's Rebuttal
We thank the referee for the careful and constructive report. The two major comments correctly identify that the central geodesic claim is asserted without a full derivation or explicit conditions. We will revise the manuscript to address both points by adding the requested theoretical details.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'our generator follows the W_2-geodesic' is stated without any derivation, without the continuous-time ODE that the discrete updates are asserted to approximate, and without the conditions (exact discriminator optimality at every generator step, sufficient network capacity, vanishing step-size) required for the optimality guarantee to hold. This assertion is load-bearing for the paper's main conclusion.
Authors: We agree the abstract states the claim concisely without supporting details. In revision we will expand the abstract to note the key assumptions (exact discriminator optimality at each step, sufficient capacity, and vanishing step size) and explicitly reference the new theoretical subsection that derives the continuous-time ODE limit. revision: yes
-
Referee: [Theoretical development] Theoretical development (wherever the geodesic property is asserted): no argument is supplied showing that the generator parameter trajectory in distribution space coincides with the W_2 geodesic when the discriminator objective equals W_2; in particular, the manuscript does not address the gap between finite-capacity discriminators, finite step sizes, and the exact Wasserstein gradient flow needed for the 'reproduces an optimal map' statement.
Authors: We acknowledge that the submitted manuscript contains only an informal argument based on the Wasserstein objective and does not supply a rigorous derivation of the parameter trajectory or discuss the finite-capacity / finite-step-size gap. We will add a dedicated subsection deriving the continuous-time Wasserstein gradient flow under the stated conditions and explicitly stating the limitations that arise when those conditions are relaxed. revision: yes
Circularity Check
No circularity: geodesic claim follows from W2 objective by standard OT properties
full rationale
The paper sets the discriminator objective equal to the 2-Wasserstein metric and states that the generator therefore follows the W2 geodesic during training. This is a direct mathematical consequence of the known geodesic property of W2 (not a redefinition or fit). No self-citation chains, uniqueness theorems from prior work, or renaming of empirical patterns appear in the abstract or described derivation. The result is not equivalent to its inputs by construction; the dynamics claim is an independent consequence that could be falsified if the min-max does not produce exact W2 values at each step. Score remains 0 as the derivation is self-contained against external OT benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The 2-Wasserstein distance induces a metric space on probability measures whose geodesics are well-defined
Forward citations
Cited by 2 Pith papers
-
Learning Monge maps with constrained drifting models
A new constrained gradient flow on the space of transport maps converges to the OT map and enables more stable and accurate training of convexity-constrained neural networks for learning Monge maps.
-
Generative Modeling by Minimizing the Wasserstein-2 Loss
Minimizing the W2 loss through a distribution-dependent ODE whose time-marginals form an exponentially convergent gradient flow, discretized via Euler scheme with persistent training that outperforms WGANs in experiments.
Reference graph
Works this paper leans on
- [1]
-
[2]
In: Modelling and optimisation of flows on networks, pp
Ambrosio, L., Gigli, N.: A users guide to optimal transport. In: Modelling and optimisation of flows on networks, pp. 1–155. Springer (2013)
work page 2013
-
[3]
Springer Science & Business Media (2008)
Ambrosio, L., Gigli, N., Savar´ e, G.: Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media (2008)
work page 2008
-
[4]
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Smooth and Sparse Optimal Transport
Blondel, M., Seguy, V., Rolet, A.: Smooth and sparse optimal transport. arXiv preprint arXiv:1710.06276 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[6]
In: Braverman Readings in Machine Learning
Bottou, L., Arjovsky, M., Lopez-Paz, D., Oquab, M.: Geometrical insights for implicit generative modeling. In: Braverman Readings in Machine Learning. Key Ideas from Inception to Current State, pp. 229–268. Springer (2018)
work page 2018
-
[7]
Communications on pure and applied mathematics 44(4), 375–417 (1991)
Brenier, Y.: Polar factorization and monotone rearrangement of vector- valued functions. Communications on pure and applied mathematics 44(4), 375–417 (1991)
work page 1991
-
[8]
Mathematical Programming 67(1-3), 169– 187 (1994)
Cominetti, R., San Mart´ ın, J.: Asymptotic analysis of the exponential penalty trajectory in linear programming. Mathematical Programming 67(1-3), 169– 187 (1994)
work page 1994
-
[9]
IEEE transactions on pattern analysis and machine intelligence 39(9), 1853–1865 (2017)
Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE transactions on pattern analysis and machine intelligence 39(9), 1853–1865 (2017)
work page 2017
- [10]
- [11]
- [12]
-
[13]
Learning Generative Models with Sinkhorn Divergences
Genevay, A., Peyr´ e, G., Cuturi, M.: Learning generative models with sinkhorn divergences. arXiv preprint arXiv:1706.00292 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [14]
-
[15]
In: International Conference on Information Processing in Medical Imaging
Gramfort, A., Peyr´ e, G., Cuturi, M.: Fast optimal transport averaging of neuroimaging data. In: International Conference on Information Processing in Medical Imaging. pp. 261–272. Springer (2015)
work page 2015
- [16]
- [17]
-
[18]
On Convergence and Stability of GANs
Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of gans. arXiv preprint arXiv:1705.07215 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[19]
A Geometric View of Optimal Transportation and Generative Model
Lei, N., Su, K., Cui, L., Yau, S.T., Gu, D.X.: A geometric view of optimal transportation and generative model. arXiv preprint arXiv:1710.05488 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [20]
-
[21]
Lu, G., Zhou, Z., Song, Y., Ren, K., Yu, Y.: Guiding the one-to-one mapping in cyclegan via optimal transport. AAAI (2019)
work page 2019
-
[22]
Implicit Manifold Learning on Generative Adversarial Networks
Lui, K.Y.C., Cao, Y., Gazeau, M., Zhang, K.S.: Implicit manifold learning on generative adversarial networks. arXiv preprint arXiv:1710.11260 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: ICML. pp. 3478–3487 (2018)
work page 2018
-
[24]
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. ICLR (2018)
work page 2018
- [25]
- [26]
-
[27]
Petzka, H., Fischer, A., Lukovnikov, D.: On the regularization of wasserstein GANs. In: ICLR (2018)
work page 2018
-
[28]
Foundations and Trends in Machine Learning 11(5-6), 355–607 (2019)
Peyr´ e, G., Cuturi, M., et al.: Computational optimal transport. Foundations and Trends in Machine Learning 11(5-6), 355–607 (2019)
work page 2019
-
[29]
Improving GANs Using Optimal Transport
Salimans, T., Zhang, H., Radford, A., Metaxas, D.: Improving gans using optimal transport. arXiv preprint arXiv:1803.05573 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[30]
On the Convergence and Robustness of Training GANs with Regularized Optimal Transport
Sanjabi, M., Ba, J., Razaviyayn, M., D. Lee, J.: On the convergence and robustness of training GANs with regularized optimal transport. arXiv preprint arXiv:1802.08249 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[31]
Bulletin of Mathematical Sciences 7(1), 87–154 (2017)
Santambrogio, F.:tEuclidean, metric, and Wassersteinu gradient flows: an overview. Bulletin of Mathematical Sciences 7(1), 87–154 (2017)
work page 2017
-
[32]
Large-Scale Optimal Transport and Mapping Estimation
Seguy, V., Bhushan Damodaran, B., Flamary, R., Courty, N., Rolet, A., Blondel, M.: Large-scale optimal transport and mapping estimation. arXiv preprint arXiv:1711.02283 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[33]
ACM Transactions on Graphics (TOG) 34(4), 66 (2015)
Solomon, J., De Goes, F., Peyr´ e, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG) 34(4), 66 (2015)
work page 2015
-
[34]
A Series of comprehensive Studies in Mathematics (2008)
Villani, C.: Optimal transport, old and new. A Series of comprehensive Studies in Mathematics (2008)
work page 2008
-
[35]
A Fast Proximal Point Method for Computing Exact Wasserstein Distance
Xie, Y., Wang, X., Wang, R., Zha, H.: A fast proximal point method for wasserstein distance. arXiv preprint arXiv:1802.04307 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
Yamaguchi, S., Koyama, M.: Distributional concavity regularization for gans. In: ICLR (2019)
work page 2019
-
[37]
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017) Adversarial Computation of Optimal Transport Maps 17 A Appendix for Adversarial Computation of Optimal Transport Maps A.1 Remaining questions and future work In the following we provide some f...
-
[38]
Gθt`1 “ Ht,t`1˝ Gθt where Ht,t`1 solves the Monge problem between µθt and µθt`1
-
[39]
Denoting Tt,t`k the unique Monge map between µθt and µθt`k, we have Tt,t`k“ Ht`k´1,t`k˝ ...˝ Ht,t`1 Proof. From Proposition 8 Ht,t`1 exists and is written Ht,t`1“p 1´ αqI` αTt where Tt is the Monge map between µθt and Px. From the fact that Tt“ I´ ∇φt, Ht,t`1 “ I´ α∇φt. Hence Ht,t`1 remains the gradient of a strictly convex function on Rm and thus is a Mo...
-
[40]
From Proposition 13, W2pµθt`1 , µθt`1,φqď ϵ1 ?
-
[41]
We can thus conclude by triangle inequality. A.4 Results in optimal transport theory We develop a bit more the materials of the background section, introducing the same notions with more details in the same order. Monge Problem Optimal transport (OT) theory [34, 2] introduces a natural quantity to distinguish two probability measures. Given two probabilit...
-
[42]
for both φ and ϵ in the discriminator. We note that in the high dimen- sional setting, better training stability and image quality is achieved by using both Leq and Lϵ which complement Lineq in enforcing the constraint. We set λineq“ λeq“ λϵ“ 10 and use the ADAM optimizer with learning rate “ 0.0001 and β1“ 0.5, β2“ 0.999 for both the generator and the di...
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.