pith. sign in

arxiv: 1906.09691 · v1 · pith:STL7XTUEnew · submitted 2019-06-24 · 💻 cs.LG · stat.ML

Adversarial Computation of Optimal Transport Maps

Pith reviewed 2026-05-25 17:48 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords optimal transport mapsgenerative adversarial networksWasserstein metricgeodesicsadversarial trainingcontinuous distributionstransport maps
0
0 comments X

The pith

A GAN with a 2-Wasserstein discriminator makes its generator follow the Wasserstein geodesic and produce an optimal transport map at convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a generative adversarial model in which the discriminator's objective is set to the 2-Wasserstein metric. It establishes that the generator's training trajectory follows the W2-geodesic connecting the initial distribution to the target distribution. This path property ensures that the generator at the end of training implements an optimal transport map. The method is demonstrated on low-dimensional toy problems as well as high-dimensional image data, where it exceeds the performance of earlier approaches to learning continuous transport maps.

Core claim

We show that during training, our generator follows the W_2-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training.

What carries the argument

The discriminator objective equal to the 2-Wasserstein metric, which enforces the generator to trace the geodesic in Wasserstein space.

Load-bearing premise

The adversarial training dynamics with a discriminator objective equal to the 2-Wasserstein metric will cause the generator to follow the geodesic.

What would settle it

An experiment or calculation showing that the sequence of intermediate generator distributions deviates from the W2 geodesic under the proposed training procedure.

Figures

Figures reproduced from arXiv: 1906.09691 by Aaron Courville, Amjad Almahairi, Jacob Leygonie, Jennifer She, Sai Rajeswar.

Figure 1
Figure 1. Figure 1: 2D data experimental settings. (a) 1024 data samples of [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Maps learned in three synthetic datasets (Figure 1) by Barycentric-OT, [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The discriminator approximates the Monge map locally in W2GAN. (a) [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Evolution of the generator (top row) and the gradient it receives from [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mapping MV-Gaussian to MNIST. (a) Samples from MV-Gaussian with [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The time evolving generated distribution minimizing its 2-Wasserstein [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The Monge problem. (a) A discrete example of the Monge problem (12) for distributions in R2 . The µ distribution consists in three equally weighted diracs in x1, x2 and x3, while the ν one is represented by y1, y2 and y3 in the same way. Black arrows denote the actual optimal transport map T. The green arrows together also define a map from µ onto ν, but it is not optimal. (b) A continuous example of the M… view at source ↗
read the original abstract

Computing optimal transport maps between high-dimensional and continuous distributions is a challenging problem in optimal transport (OT). Generative adversarial networks (GANs) are powerful generative models which have been successfully applied to learn maps across high-dimensional domains. However, little is known about the nature of the map learned with a GAN objective. To address this problem, we propose a generative adversarial model in which the discriminator's objective is the $2$-Wasserstein metric. We show that during training, our generator follows the $W_2$-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training. We validate our approach empirically in both low-dimensional and high-dimensional continuous settings, and show that it outperforms prior methods on image data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a generative adversarial model in which the discriminator objective is set to the 2-Wasserstein metric. It asserts that the resulting training dynamics cause the generator to follow the W_2 geodesic between the initial and target distributions, thereby reproducing an optimal transport map at convergence. Empirical validation is reported on low-dimensional continuous distributions and high-dimensional image data, with claimed outperformance over prior methods.

Significance. If the geodesic-flow claim can be placed on a rigorous footing, the work would establish a direct link between a specific adversarial objective and optimal transport geometry, providing a principled route to high-dimensional OT maps that avoids explicit linear programming or entropic regularization. The reported empirical gains on image data would then constitute evidence of practical utility beyond existing GAN-based transport estimators.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'our generator follows the W_2-geodesic' is stated without any derivation, without the continuous-time ODE that the discrete updates are asserted to approximate, and without the conditions (exact discriminator optimality at every generator step, sufficient network capacity, vanishing step-size) required for the optimality guarantee to hold. This assertion is load-bearing for the paper's main conclusion.
  2. [Theoretical development] Theoretical development (wherever the geodesic property is asserted): no argument is supplied showing that the generator parameter trajectory in distribution space coincides with the W_2 geodesic when the discriminator objective equals W_2; in particular, the manuscript does not address the gap between finite-capacity discriminators, finite step sizes, and the exact Wasserstein gradient flow needed for the 'reproduces an optimal map' statement.
minor comments (1)
  1. The abstract states that the method 'outperforms prior methods on image data' but supplies no quantitative table, no description of the baselines, and no protocol for measuring map quality (e.g., how the learned map is evaluated against a ground-truth transport).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. The two major comments correctly identify that the central geodesic claim is asserted without a full derivation or explicit conditions. We will revise the manuscript to address both points by adding the requested theoretical details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'our generator follows the W_2-geodesic' is stated without any derivation, without the continuous-time ODE that the discrete updates are asserted to approximate, and without the conditions (exact discriminator optimality at every generator step, sufficient network capacity, vanishing step-size) required for the optimality guarantee to hold. This assertion is load-bearing for the paper's main conclusion.

    Authors: We agree the abstract states the claim concisely without supporting details. In revision we will expand the abstract to note the key assumptions (exact discriminator optimality at each step, sufficient capacity, and vanishing step size) and explicitly reference the new theoretical subsection that derives the continuous-time ODE limit. revision: yes

  2. Referee: [Theoretical development] Theoretical development (wherever the geodesic property is asserted): no argument is supplied showing that the generator parameter trajectory in distribution space coincides with the W_2 geodesic when the discriminator objective equals W_2; in particular, the manuscript does not address the gap between finite-capacity discriminators, finite step sizes, and the exact Wasserstein gradient flow needed for the 'reproduces an optimal map' statement.

    Authors: We acknowledge that the submitted manuscript contains only an informal argument based on the Wasserstein objective and does not supply a rigorous derivation of the parameter trajectory or discuss the finite-capacity / finite-step-size gap. We will add a dedicated subsection deriving the continuous-time Wasserstein gradient flow under the stated conditions and explicitly stating the limitations that arise when those conditions are relaxed. revision: yes

Circularity Check

0 steps flagged

No circularity: geodesic claim follows from W2 objective by standard OT properties

full rationale

The paper sets the discriminator objective equal to the 2-Wasserstein metric and states that the generator therefore follows the W2 geodesic during training. This is a direct mathematical consequence of the known geodesic property of W2 (not a redefinition or fit). No self-citation chains, uniqueness theorems from prior work, or renaming of empirical patterns appear in the abstract or described derivation. The result is not equivalent to its inputs by construction; the dynamics claim is an independent consequence that could be falsified if the min-max does not produce exact W2 values at each step. Score remains 0 as the derivation is self-contained against external OT benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract, the work introduces no free parameters, invented entities, or ad-hoc axioms; it relies on standard properties of the 2-Wasserstein metric.

axioms (1)
  • standard math The 2-Wasserstein distance induces a metric space on probability measures whose geodesics are well-defined
    Invoked to conclude that the generator follows the geodesic during training.

pith-pipeline@v0.9.0 · 5666 in / 1185 out tokens · 33844 ms · 2026-05-25T17:48:09.876877+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning Monge maps with constrained drifting models

    math.OC 2026-03 unverdicted novelty 7.0

    A new constrained gradient flow on the space of transport maps converges to the OT map and enables more stable and accurate training of convexity-constrained neural networks for learning Monge maps.

  2. Generative Modeling by Minimizing the Wasserstein-2 Loss

    stat.ML 2024-06 unverdicted novelty 7.0

    Minimizing the W2 loss through a distribution-dependent ODE whose time-marginals form an exponentially convergent gradient flow, discretized via Euler scheme with persistent training that outperforms WGANs in experiments.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 2 Pith papers · 10 internal anchors

  1. [1]

    In: ICML

    Almahairi, A., Rajeshwar, S., Sordoni, A., Bachman, P., Courville, A.: Augmented cyclegan: Learning many-to-many mappings from unpaired data. In: ICML. pp. 195–204 (2018)

  2. [2]

    In: Modelling and optimisation of flows on networks, pp

    Ambrosio, L., Gigli, N.: A users guide to optimal transport. In: Modelling and optimisation of flows on networks, pp. 1–155. Springer (2013)

  3. [3]

    Springer Science & Business Media (2008)

    Ambrosio, L., Gigli, N., Savar´ e, G.: Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media (2008)

  4. [4]

    Wasserstein GAN

    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)

  5. [5]

    Smooth and Sparse Optimal Transport

    Blondel, M., Seguy, V., Rolet, A.: Smooth and sparse optimal transport. arXiv preprint arXiv:1710.06276 (2017)

  6. [6]

    In: Braverman Readings in Machine Learning

    Bottou, L., Arjovsky, M., Lopez-Paz, D., Oquab, M.: Geometrical insights for implicit generative modeling. In: Braverman Readings in Machine Learning. Key Ideas from Inception to Current State, pp. 229–268. Springer (2018)

  7. [7]

    Communications on pure and applied mathematics 44(4), 375–417 (1991)

    Brenier, Y.: Polar factorization and monotone rearrangement of vector- valued functions. Communications on pure and applied mathematics 44(4), 375–417 (1991)

  8. [8]

    Mathematical Programming 67(1-3), 169– 187 (1994)

    Cominetti, R., San Mart´ ın, J.: Asymptotic analysis of the exponential penalty trajectory in linear programming. Mathematical Programming 67(1-3), 169– 187 (1994)

  9. [9]

    IEEE transactions on pattern analysis and machine intelligence 39(9), 1853–1865 (2017)

    Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE transactions on pattern analysis and machine intelligence 39(9), 1853–1865 (2017)

  10. [10]

    In: NIPS

    Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: NIPS. pp. 2292–2300 (2013)

  11. [11]

    In: ICML

    Cuturi, M., Doucet, A.: Fast computation of wasserstein barycenters. In: ICML. pp. 685–693 (2014)

  12. [12]

    In: NIPS

    Genevay, A., Cuturi, M., Peyr´ e, G., Bach, F.: Stochastic optimization for large-scale optimal transport. In: NIPS. pp. 3440–3448 (2016)

  13. [13]

    Learning Generative Models with Sinkhorn Divergences

    Genevay, A., Peyr´ e, G., Cuturi, M.: Learning generative models with sinkhorn divergences. arXiv preprint arXiv:1706.00292 (2017)

  14. [14]

    In: NIPS

    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS. pp. 2672–2680 (2014)

  15. [15]

    In: International Conference on Information Processing in Medical Imaging

    Gramfort, A., Peyr´ e, G., Cuturi, M.: Fast optimal transport averaging of neuroimaging data. In: International Conference on Information Processing in Medical Imaging. pp. 261–272. Springer (2015)

  16. [16]

    In: NIPS

    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: NIPS. pp. 5767–5777 (2017)

  17. [17]

    In: ICML

    Johnson, R., Zhang, T.: Composite functional gradient learning of generative adversarial models. In: ICML. pp. 2376–2384 (2018) 16 J. Leygonie et al

  18. [18]

    On Convergence and Stability of GANs

    Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of gans. arXiv preprint arXiv:1705.07215 (2017)

  19. [19]

    A Geometric View of Optimal Transportation and Generative Model

    Lei, N., Su, K., Cui, L., Yau, S.T., Gu, D.X.: A geometric view of optimal transportation and generative model. arXiv preprint arXiv:1710.05488 (2017)

  20. [20]

    In: NIPS

    Lin, Z., Khetan, A., Fanti, G., Oh, S.: Pacgan: The power of two samples in generative adversarial networks. In: NIPS. pp. 1505–1514 (2018)

  21. [21]

    AAAI (2019)

    Lu, G., Zhou, Z., Song, Y., Ren, K., Yu, Y.: Guiding the one-to-one mapping in cyclegan via optimal transport. AAAI (2019)

  22. [22]

    Implicit Manifold Learning on Generative Adversarial Networks

    Lui, K.Y.C., Cao, Y., Gazeau, M., Zhang, K.S.: Implicit manifold learning on generative adversarial networks. arXiv preprint arXiv:1710.11260 (2017)

  23. [23]

    Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: ICML. pp. 3478–3487 (2018)

  24. [24]

    ICLR (2018)

    Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. ICLR (2018)

  25. [25]

    In: NIPS

    Nagarajan, V., Kolter, J.Z.: Gradient descent gan optimization is locally stable. In: NIPS. pp. 5585–5595 (2017)

  26. [26]

    In: NIPS

    Nowozin, S., Cseke, B., Tomioka, R.: f-gan: Training generative neural samplers using variational divergence minimization. In: NIPS. pp. 271–279 (2016)

  27. [27]

    In: ICLR (2018)

    Petzka, H., Fischer, A., Lukovnikov, D.: On the regularization of wasserstein GANs. In: ICLR (2018)

  28. [28]

    Foundations and Trends in Machine Learning 11(5-6), 355–607 (2019)

    Peyr´ e, G., Cuturi, M., et al.: Computational optimal transport. Foundations and Trends in Machine Learning 11(5-6), 355–607 (2019)

  29. [29]

    Improving GANs Using Optimal Transport

    Salimans, T., Zhang, H., Radford, A., Metaxas, D.: Improving gans using optimal transport. arXiv preprint arXiv:1803.05573 (2018)

  30. [30]

    On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

    Sanjabi, M., Ba, J., Razaviyayn, M., D. Lee, J.: On the convergence and robustness of training GANs with regularized optimal transport. arXiv preprint arXiv:1802.08249 (2018)

  31. [31]

    Bulletin of Mathematical Sciences 7(1), 87–154 (2017)

    Santambrogio, F.:tEuclidean, metric, and Wassersteinu gradient flows: an overview. Bulletin of Mathematical Sciences 7(1), 87–154 (2017)

  32. [32]

    Large-Scale Optimal Transport and Mapping Estimation

    Seguy, V., Bhushan Damodaran, B., Flamary, R., Courty, N., Rolet, A., Blondel, M.: Large-scale optimal transport and mapping estimation. arXiv preprint arXiv:1711.02283 (2018)

  33. [33]

    ACM Transactions on Graphics (TOG) 34(4), 66 (2015)

    Solomon, J., De Goes, F., Peyr´ e, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG) 34(4), 66 (2015)

  34. [34]

    A Series of comprehensive Studies in Mathematics (2008)

    Villani, C.: Optimal transport, old and new. A Series of comprehensive Studies in Mathematics (2008)

  35. [35]

    A Fast Proximal Point Method for Computing Exact Wasserstein Distance

    Xie, Y., Wang, X., Wang, R., Zha, H.: A fast proximal point method for wasserstein distance. arXiv preprint arXiv:1802.04307 (2018)

  36. [36]

    In: ICLR (2019)

    Yamaguchi, S., Koyama, M.: Distributional concavity regularization for gans. In: ICLR (2019)

  37. [37]

    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017) Adversarial Computation of Optimal Transport Maps 17 A Appendix for Adversarial Computation of Optimal Transport Maps A.1 Remaining questions and future work In the following we provide some f...

  38. [38]

    Gθt`1 “ Ht,t`1˝ Gθt where Ht,t`1 solves the Monge problem between µθt and µθt`1

  39. [39]

    From Proposition 8 Ht,t`1 exists and is written Ht,t`1“p 1´ αqI` αTt where Tt is the Monge map between µθt and Px

    Denoting Tt,t`k the unique Monge map between µθt and µθt`k, we have Tt,t`k“ Ht`k´1,t`k˝ ...˝ Ht,t`1 Proof. From Proposition 8 Ht,t`1 exists and is written Ht,t`1“p 1´ αqI` αTt where Tt is the Monge map between µθt and Px. From the fact that Tt“ I´ ∇φt, Ht,t`1 “ I´ α∇φt. Hence Ht,t`1 remains the gradient of a strictly convex function on Rm and thus is a Mo...

  40. [40]

    From Proposition 13, W2pµθt`1 , µθt`1,φqď ϵ1 ?

  41. [41]

    transports

    We can thus conclude by triangle inequality. A.4 Results in optimal transport theory We develop a bit more the materials of the background section, introducing the same notions with more details in the same order. Monge Problem Optimal transport (OT) theory [34, 2] introduces a natural quantity to distinguish two probability measures. Given two probabilit...

  42. [42]

    We note that in the high dimen- sional setting, better training stability and image quality is achieved by using both Leq and Lϵ which complement Lineq in enforcing the constraint

    for both φ and ϵ in the discriminator. We note that in the high dimen- sional setting, better training stability and image quality is achieved by using both Leq and Lϵ which complement Lineq in enforcing the constraint. We set λineq“ λeq“ λϵ“ 10 and use the ADAM optimizer with learning rate “ 0.0001 and β1“ 0.5, β2“ 0.999 for both the generator and the di...