Adversarial Computation of Optimal Transport Maps

Aaron Courville; Amjad Almahairi; Jacob Leygonie; Jennifer She; Sai Rajeswar

arxiv: 1906.09691 · v1 · pith:STL7XTUEnew · submitted 2019-06-24 · 💻 cs.LG · stat.ML

Adversarial Computation of Optimal Transport Maps

Jacob Leygonie , Jennifer She , Amjad Almahairi , Sai Rajeswar , Aaron Courville This is my paper

Pith reviewed 2026-05-25 17:48 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords optimal transport mapsgenerative adversarial networksWasserstein metricgeodesicsadversarial trainingcontinuous distributionstransport maps

0 comments

The pith

A GAN with a 2-Wasserstein discriminator makes its generator follow the Wasserstein geodesic and produce an optimal transport map at convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a generative adversarial model in which the discriminator's objective is set to the 2-Wasserstein metric. It establishes that the generator's training trajectory follows the W2-geodesic connecting the initial distribution to the target distribution. This path property ensures that the generator at the end of training implements an optimal transport map. The method is demonstrated on low-dimensional toy problems as well as high-dimensional image data, where it exceeds the performance of earlier approaches to learning continuous transport maps.

Core claim

We show that during training, our generator follows the W_2-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training.

What carries the argument

The discriminator objective equal to the 2-Wasserstein metric, which enforces the generator to trace the geodesic in Wasserstein space.

Load-bearing premise

The adversarial training dynamics with a discriminator objective equal to the 2-Wasserstein metric will cause the generator to follow the geodesic.

What would settle it

An experiment or calculation showing that the sequence of intermediate generator distributions deviates from the W2 geodesic under the proposed training procedure.

Figures

Figures reproduced from arXiv: 1906.09691 by Aaron Courville, Amjad Almahairi, Jacob Leygonie, Jennifer She, Sai Rajeswar.

**Figure 2.** Figure 2: Maps learned in three synthetic datasets (Figure 1) by Barycentric-OT, [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: The discriminator approximates the Monge map locally in W2GAN. (a) [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Evolution of the generator (top row) and the gradient it receives from [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Mapping MV-Gaussian to MNIST. (a) Samples from MV-Gaussian with [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: The time evolving generated distribution minimizing its 2-Wasserstein [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: The Monge problem. (a) A discrete example of the Monge problem (12) for distributions in R2 . The µ distribution consists in three equally weighted diracs in x1, x2 and x3, while the ν one is represented by y1, y2 and y3 in the same way. Black arrows denote the actual optimal transport map T. The green arrows together also define a map from µ onto ν, but it is not optimal. (b) A continuous example of the M… view at source ↗

read the original abstract

Computing optimal transport maps between high-dimensional and continuous distributions is a challenging problem in optimal transport (OT). Generative adversarial networks (GANs) are powerful generative models which have been successfully applied to learn maps across high-dimensional domains. However, little is known about the nature of the map learned with a GAN objective. To address this problem, we propose a generative adversarial model in which the discriminator's objective is the $2$-Wasserstein metric. We show that during training, our generator follows the $W_2$-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training. We validate our approach empirically in both low-dimensional and high-dimensional continuous settings, and show that it outperforms prior methods on image data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets a W2 discriminator in a GAN so the generator follows the Wasserstein geodesic and ends with an optimal map, but the abstract leaves the derivation and the exactness of that flow unshown.

read the letter

The main takeaway is that they train a generator against a discriminator whose objective is literally the 2-Wasserstein distance, then argue that this forces the generator distribution to travel along the W2 geodesic from its starting point to the target, so the learned map at the end is optimal transport. That link between the discriminator choice and geodesic flow is the piece they present as new, and it is distinct from the usual GAN analyses of what maps get learned. They run experiments in low-dimensional continuous cases and on image data, reporting that the method beats some earlier approaches for map recovery. That empirical section is the part that actually lands: concrete checks in both toy and high-dimensional settings give something to look at even if the theory is thin. The soft spot is exactly the one the stress-test flags. The geodesic claim requires the discriminator to stay optimal at every generator step, the update to be the true Wasserstein gradient, and the discrete steps to stay close to the continuous flow. The abstract states the result without showing the steps or any bound on how much optimization error or finite capacity can be tolerated before the guarantee fails. In real neural-net training those conditions are only approximate, so the “reproduces an optimal map” statement is likely only approximate in practice; the paper would need to address that gap directly. The citation pattern is standard for the OT-plus-GAN area and does not look circular. This is for people working on high-dimensional transport or on making GAN dynamics more geometrically interpretable. A reader who wants a new adversarial route to OT maps will find the experiments useful to build on. It is worth sending to peer review because the idea is cleanly stated, the experiments are there to discuss, and the theory can be tightened in revision rather than rejected outright.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a generative adversarial model in which the discriminator objective is set to the 2-Wasserstein metric. It asserts that the resulting training dynamics cause the generator to follow the W_2 geodesic between the initial and target distributions, thereby reproducing an optimal transport map at convergence. Empirical validation is reported on low-dimensional continuous distributions and high-dimensional image data, with claimed outperformance over prior methods.

Significance. If the geodesic-flow claim can be placed on a rigorous footing, the work would establish a direct link between a specific adversarial objective and optimal transport geometry, providing a principled route to high-dimensional OT maps that avoids explicit linear programming or entropic regularization. The reported empirical gains on image data would then constitute evidence of practical utility beyond existing GAN-based transport estimators.

major comments (2)

[Abstract] Abstract: the central claim that 'our generator follows the W_2-geodesic' is stated without any derivation, without the continuous-time ODE that the discrete updates are asserted to approximate, and without the conditions (exact discriminator optimality at every generator step, sufficient network capacity, vanishing step-size) required for the optimality guarantee to hold. This assertion is load-bearing for the paper's main conclusion.
[Theoretical development] Theoretical development (wherever the geodesic property is asserted): no argument is supplied showing that the generator parameter trajectory in distribution space coincides with the W_2 geodesic when the discriminator objective equals W_2; in particular, the manuscript does not address the gap between finite-capacity discriminators, finite step sizes, and the exact Wasserstein gradient flow needed for the 'reproduces an optimal map' statement.

minor comments (1)

The abstract states that the method 'outperforms prior methods on image data' but supplies no quantitative table, no description of the baselines, and no protocol for measuring map quality (e.g., how the learned map is evaluated against a ground-truth transport).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. The two major comments correctly identify that the central geodesic claim is asserted without a full derivation or explicit conditions. We will revise the manuscript to address both points by adding the requested theoretical details.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'our generator follows the W_2-geodesic' is stated without any derivation, without the continuous-time ODE that the discrete updates are asserted to approximate, and without the conditions (exact discriminator optimality at every generator step, sufficient network capacity, vanishing step-size) required for the optimality guarantee to hold. This assertion is load-bearing for the paper's main conclusion.

Authors: We agree the abstract states the claim concisely without supporting details. In revision we will expand the abstract to note the key assumptions (exact discriminator optimality at each step, sufficient capacity, and vanishing step size) and explicitly reference the new theoretical subsection that derives the continuous-time ODE limit. revision: yes
Referee: [Theoretical development] Theoretical development (wherever the geodesic property is asserted): no argument is supplied showing that the generator parameter trajectory in distribution space coincides with the W_2 geodesic when the discriminator objective equals W_2; in particular, the manuscript does not address the gap between finite-capacity discriminators, finite step sizes, and the exact Wasserstein gradient flow needed for the 'reproduces an optimal map' statement.

Authors: We acknowledge that the submitted manuscript contains only an informal argument based on the Wasserstein objective and does not supply a rigorous derivation of the parameter trajectory or discuss the finite-capacity / finite-step-size gap. We will add a dedicated subsection deriving the continuous-time Wasserstein gradient flow under the stated conditions and explicitly stating the limitations that arise when those conditions are relaxed. revision: yes

Circularity Check

0 steps flagged

No circularity: geodesic claim follows from W2 objective by standard OT properties

full rationale

The paper sets the discriminator objective equal to the 2-Wasserstein metric and states that the generator therefore follows the W2 geodesic during training. This is a direct mathematical consequence of the known geodesic property of W2 (not a redefinition or fit). No self-citation chains, uniqueness theorems from prior work, or renaming of empirical patterns appear in the abstract or described derivation. The result is not equivalent to its inputs by construction; the dynamics claim is an independent consequence that could be falsified if the min-max does not produce exact W2 values at each step. Score remains 0 as the derivation is self-contained against external OT benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract, the work introduces no free parameters, invented entities, or ad-hoc axioms; it relies on standard properties of the 2-Wasserstein metric.

axioms (1)

standard math The 2-Wasserstein distance induces a metric space on probability measures whose geodesics are well-defined
Invoked to conclude that the generator follows the geodesic during training.

pith-pipeline@v0.9.0 · 5666 in / 1185 out tokens · 33844 ms · 2026-05-25T17:48:09.876877+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Monge maps with constrained drifting models
math.OC 2026-03 unverdicted novelty 7.0

A new constrained gradient flow on the space of transport maps converges to the OT map and enables more stable and accurate training of convexity-constrained neural networks for learning Monge maps.
Generative Modeling by Minimizing the Wasserstein-2 Loss
stat.ML 2024-06 unverdicted novelty 7.0

Minimizing the W2 loss through a distribution-dependent ODE whose time-marginals form an exponentially convergent gradient flow, discretized via Euler scheme with persistent training that outperforms WGANs in experiments.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 2 Pith papers · 10 internal anchors

[1]

In: ICML

Almahairi, A., Rajeshwar, S., Sordoni, A., Bachman, P., Courville, A.: Augmented cyclegan: Learning many-to-many mappings from unpaired data. In: ICML. pp. 195–204 (2018)

work page 2018
[2]

In: Modelling and optimisation of ﬂows on networks, pp

Ambrosio, L., Gigli, N.: A users guide to optimal transport. In: Modelling and optimisation of ﬂows on networks, pp. 1–155. Springer (2013)

work page 2013
[3]

Springer Science & Business Media (2008)

Ambrosio, L., Gigli, N., Savar´ e, G.: Gradient ﬂows: in metric spaces and in the space of probability measures. Springer Science & Business Media (2008)

work page 2008
[4]

Wasserstein GAN

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Smooth and Sparse Optimal Transport

Blondel, M., Seguy, V., Rolet, A.: Smooth and sparse optimal transport. arXiv preprint arXiv:1710.06276 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

In: Braverman Readings in Machine Learning

Bottou, L., Arjovsky, M., Lopez-Paz, D., Oquab, M.: Geometrical insights for implicit generative modeling. In: Braverman Readings in Machine Learning. Key Ideas from Inception to Current State, pp. 229–268. Springer (2018)

work page 2018
[7]

Communications on pure and applied mathematics 44(4), 375–417 (1991)

Brenier, Y.: Polar factorization and monotone rearrangement of vector- valued functions. Communications on pure and applied mathematics 44(4), 375–417 (1991)

work page 1991
[8]

Mathematical Programming 67(1-3), 169– 187 (1994)

Cominetti, R., San Mart´ ın, J.: Asymptotic analysis of the exponential penalty trajectory in linear programming. Mathematical Programming 67(1-3), 169– 187 (1994)

work page 1994
[9]

IEEE transactions on pattern analysis and machine intelligence 39(9), 1853–1865 (2017)

Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE transactions on pattern analysis and machine intelligence 39(9), 1853–1865 (2017)

work page 2017
[10]

In: NIPS

Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: NIPS. pp. 2292–2300 (2013)

work page 2013
[11]

In: ICML

Cuturi, M., Doucet, A.: Fast computation of wasserstein barycenters. In: ICML. pp. 685–693 (2014)

work page 2014
[12]

In: NIPS

Genevay, A., Cuturi, M., Peyr´ e, G., Bach, F.: Stochastic optimization for large-scale optimal transport. In: NIPS. pp. 3440–3448 (2016)

work page 2016
[13]

Learning Generative Models with Sinkhorn Divergences

Genevay, A., Peyr´ e, G., Cuturi, M.: Learning generative models with sinkhorn divergences. arXiv preprint arXiv:1706.00292 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

In: NIPS

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS. pp. 2672–2680 (2014)

work page 2014
[15]

In: International Conference on Information Processing in Medical Imaging

Gramfort, A., Peyr´ e, G., Cuturi, M.: Fast optimal transport averaging of neuroimaging data. In: International Conference on Information Processing in Medical Imaging. pp. 261–272. Springer (2015)

work page 2015
[16]

In: NIPS

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: NIPS. pp. 5767–5777 (2017)

work page 2017
[17]

In: ICML

Johnson, R., Zhang, T.: Composite functional gradient learning of generative adversarial models. In: ICML. pp. 2376–2384 (2018) 16 J. Leygonie et al

work page 2018
[18]

On Convergence and Stability of GANs

Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of gans. arXiv preprint arXiv:1705.07215 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

A Geometric View of Optimal Transportation and Generative Model

Lei, N., Su, K., Cui, L., Yau, S.T., Gu, D.X.: A geometric view of optimal transportation and generative model. arXiv preprint arXiv:1710.05488 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[20]

In: NIPS

Lin, Z., Khetan, A., Fanti, G., Oh, S.: Pacgan: The power of two samples in generative adversarial networks. In: NIPS. pp. 1505–1514 (2018)

work page 2018
[21]

AAAI (2019)

Lu, G., Zhou, Z., Song, Y., Ren, K., Yu, Y.: Guiding the one-to-one mapping in cyclegan via optimal transport. AAAI (2019)

work page 2019
[22]

Implicit Manifold Learning on Generative Adversarial Networks

Lui, K.Y.C., Cao, Y., Gazeau, M., Zhang, K.S.: Implicit manifold learning on generative adversarial networks. arXiv preprint arXiv:1710.11260 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: ICML. pp. 3478–3487 (2018)

work page 2018
[24]

ICLR (2018)

Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. ICLR (2018)

work page 2018
[25]

In: NIPS

Nagarajan, V., Kolter, J.Z.: Gradient descent gan optimization is locally stable. In: NIPS. pp. 5585–5595 (2017)

work page 2017
[26]

In: NIPS

Nowozin, S., Cseke, B., Tomioka, R.: f-gan: Training generative neural samplers using variational divergence minimization. In: NIPS. pp. 271–279 (2016)

work page 2016
[27]

In: ICLR (2018)

Petzka, H., Fischer, A., Lukovnikov, D.: On the regularization of wasserstein GANs. In: ICLR (2018)

work page 2018
[28]

Foundations and Trends in Machine Learning 11(5-6), 355–607 (2019)

Peyr´ e, G., Cuturi, M., et al.: Computational optimal transport. Foundations and Trends in Machine Learning 11(5-6), 355–607 (2019)

work page 2019
[29]

Improving GANs Using Optimal Transport

Salimans, T., Zhang, H., Radford, A., Metaxas, D.: Improving gans using optimal transport. arXiv preprint arXiv:1803.05573 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

Sanjabi, M., Ba, J., Razaviyayn, M., D. Lee, J.: On the convergence and robustness of training GANs with regularized optimal transport. arXiv preprint arXiv:1802.08249 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Bulletin of Mathematical Sciences 7(1), 87–154 (2017)

Santambrogio, F.:tEuclidean, metric, and Wassersteinu gradient ﬂows: an overview. Bulletin of Mathematical Sciences 7(1), 87–154 (2017)

work page 2017
[32]

Large-Scale Optimal Transport and Mapping Estimation

Seguy, V., Bhushan Damodaran, B., Flamary, R., Courty, N., Rolet, A., Blondel, M.: Large-scale optimal transport and mapping estimation. arXiv preprint arXiv:1711.02283 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[33]

ACM Transactions on Graphics (TOG) 34(4), 66 (2015)

Solomon, J., De Goes, F., Peyr´ e, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional wasserstein distances: Eﬃcient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG) 34(4), 66 (2015)

work page 2015
[34]

A Series of comprehensive Studies in Mathematics (2008)

Villani, C.: Optimal transport, old and new. A Series of comprehensive Studies in Mathematics (2008)

work page 2008
[35]

A Fast Proximal Point Method for Computing Exact Wasserstein Distance

Xie, Y., Wang, X., Wang, R., Zha, H.: A fast proximal point method for wasserstein distance. arXiv preprint arXiv:1802.04307 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

In: ICLR (2019)

Yamaguchi, S., Koyama, M.: Distributional concavity regularization for gans. In: ICLR (2019)

work page 2019
[37]

Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017) Adversarial Computation of Optimal Transport Maps 17 A Appendix for Adversarial Computation of Optimal Transport Maps A.1 Remaining questions and future work In the following we provide some f...

work page arXiv 2017
[38]

Gθt`1 “ Ht,t`1˝ Gθt where Ht,t`1 solves the Monge problem between µθt and µθt`1

work page
[39]

From Proposition 8 Ht,t`1 exists and is written Ht,t`1“p 1´ αqI` αTt where Tt is the Monge map between µθt and Px

Denoting Tt,t`k the unique Monge map between µθt and µθt`k, we have Tt,t`k“ Ht`k´1,t`k˝ ...˝ Ht,t`1 Proof. From Proposition 8 Ht,t`1 exists and is written Ht,t`1“p 1´ αqI` αTt where Tt is the Monge map between µθt and Px. From the fact that Tt“ I´ ∇φt, Ht,t`1 “ I´ α∇φt. Hence Ht,t`1 remains the gradient of a strictly convex function on Rm and thus is a Mo...

work page
[40]

From Proposition 13, W2pµθt`1 , µθt`1,φqď ϵ1 ?

work page
[41]

transports

We can thus conclude by triangle inequality. A.4 Results in optimal transport theory We develop a bit more the materials of the background section, introducing the same notions with more details in the same order. Monge Problem Optimal transport (OT) theory [34, 2] introduces a natural quantity to distinguish two probability measures. Given two probabilit...

work page
[42]

We note that in the high dimen- sional setting, better training stability and image quality is achieved by using both Leq and Lϵ which complement Lineq in enforcing the constraint

for both φ and ϵ in the discriminator. We note that in the high dimen- sional setting, better training stability and image quality is achieved by using both Leq and Lϵ which complement Lineq in enforcing the constraint. We set λineq“ λeq“ λϵ“ 10 and use the ADAM optimizer with learning rate “ 0.0001 and β1“ 0.5, β2“ 0.999 for both the generator and the di...

work page 2007

[1] [1]

In: ICML

Almahairi, A., Rajeshwar, S., Sordoni, A., Bachman, P., Courville, A.: Augmented cyclegan: Learning many-to-many mappings from unpaired data. In: ICML. pp. 195–204 (2018)

work page 2018

[2] [2]

In: Modelling and optimisation of ﬂows on networks, pp

Ambrosio, L., Gigli, N.: A users guide to optimal transport. In: Modelling and optimisation of ﬂows on networks, pp. 1–155. Springer (2013)

work page 2013

[3] [3]

Springer Science & Business Media (2008)

Ambrosio, L., Gigli, N., Savar´ e, G.: Gradient ﬂows: in metric spaces and in the space of probability measures. Springer Science & Business Media (2008)

work page 2008

[4] [4]

Wasserstein GAN

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

Smooth and Sparse Optimal Transport

Blondel, M., Seguy, V., Rolet, A.: Smooth and sparse optimal transport. arXiv preprint arXiv:1710.06276 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

In: Braverman Readings in Machine Learning

Bottou, L., Arjovsky, M., Lopez-Paz, D., Oquab, M.: Geometrical insights for implicit generative modeling. In: Braverman Readings in Machine Learning. Key Ideas from Inception to Current State, pp. 229–268. Springer (2018)

work page 2018

[7] [7]

Communications on pure and applied mathematics 44(4), 375–417 (1991)

Brenier, Y.: Polar factorization and monotone rearrangement of vector- valued functions. Communications on pure and applied mathematics 44(4), 375–417 (1991)

work page 1991

[8] [8]

Mathematical Programming 67(1-3), 169– 187 (1994)

Cominetti, R., San Mart´ ın, J.: Asymptotic analysis of the exponential penalty trajectory in linear programming. Mathematical Programming 67(1-3), 169– 187 (1994)

work page 1994

[9] [9]

IEEE transactions on pattern analysis and machine intelligence 39(9), 1853–1865 (2017)

Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE transactions on pattern analysis and machine intelligence 39(9), 1853–1865 (2017)

work page 2017

[10] [10]

In: NIPS

Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: NIPS. pp. 2292–2300 (2013)

work page 2013

[11] [11]

In: ICML

Cuturi, M., Doucet, A.: Fast computation of wasserstein barycenters. In: ICML. pp. 685–693 (2014)

work page 2014

[12] [12]

In: NIPS

Genevay, A., Cuturi, M., Peyr´ e, G., Bach, F.: Stochastic optimization for large-scale optimal transport. In: NIPS. pp. 3440–3448 (2016)

work page 2016

[13] [13]

Learning Generative Models with Sinkhorn Divergences

Genevay, A., Peyr´ e, G., Cuturi, M.: Learning generative models with sinkhorn divergences. arXiv preprint arXiv:1706.00292 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

In: NIPS

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS. pp. 2672–2680 (2014)

work page 2014

[15] [15]

In: International Conference on Information Processing in Medical Imaging

Gramfort, A., Peyr´ e, G., Cuturi, M.: Fast optimal transport averaging of neuroimaging data. In: International Conference on Information Processing in Medical Imaging. pp. 261–272. Springer (2015)

work page 2015

[16] [16]

In: NIPS

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: NIPS. pp. 5767–5777 (2017)

work page 2017

[17] [17]

In: ICML

Johnson, R., Zhang, T.: Composite functional gradient learning of generative adversarial models. In: ICML. pp. 2376–2384 (2018) 16 J. Leygonie et al

work page 2018

[18] [18]

On Convergence and Stability of GANs

Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of gans. arXiv preprint arXiv:1705.07215 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

A Geometric View of Optimal Transportation and Generative Model

Lei, N., Su, K., Cui, L., Yau, S.T., Gu, D.X.: A geometric view of optimal transportation and generative model. arXiv preprint arXiv:1710.05488 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[20] [20]

In: NIPS

Lin, Z., Khetan, A., Fanti, G., Oh, S.: Pacgan: The power of two samples in generative adversarial networks. In: NIPS. pp. 1505–1514 (2018)

work page 2018

[21] [21]

AAAI (2019)

Lu, G., Zhou, Z., Song, Y., Ren, K., Yu, Y.: Guiding the one-to-one mapping in cyclegan via optimal transport. AAAI (2019)

work page 2019

[22] [22]

Implicit Manifold Learning on Generative Adversarial Networks

Lui, K.Y.C., Cao, Y., Gazeau, M., Zhang, K.S.: Implicit manifold learning on generative adversarial networks. arXiv preprint arXiv:1710.11260 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for gans do actually converge? In: ICML. pp. 3478–3487 (2018)

work page 2018

[24] [24]

ICLR (2018)

Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. ICLR (2018)

work page 2018

[25] [25]

In: NIPS

Nagarajan, V., Kolter, J.Z.: Gradient descent gan optimization is locally stable. In: NIPS. pp. 5585–5595 (2017)

work page 2017

[26] [26]

In: NIPS

Nowozin, S., Cseke, B., Tomioka, R.: f-gan: Training generative neural samplers using variational divergence minimization. In: NIPS. pp. 271–279 (2016)

work page 2016

[27] [27]

In: ICLR (2018)

Petzka, H., Fischer, A., Lukovnikov, D.: On the regularization of wasserstein GANs. In: ICLR (2018)

work page 2018

[28] [28]

Foundations and Trends in Machine Learning 11(5-6), 355–607 (2019)

Peyr´ e, G., Cuturi, M., et al.: Computational optimal transport. Foundations and Trends in Machine Learning 11(5-6), 355–607 (2019)

work page 2019

[29] [29]

Improving GANs Using Optimal Transport

Salimans, T., Zhang, H., Radford, A., Metaxas, D.: Improving gans using optimal transport. arXiv preprint arXiv:1803.05573 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

Sanjabi, M., Ba, J., Razaviyayn, M., D. Lee, J.: On the convergence and robustness of training GANs with regularized optimal transport. arXiv preprint arXiv:1802.08249 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Bulletin of Mathematical Sciences 7(1), 87–154 (2017)

Santambrogio, F.:tEuclidean, metric, and Wassersteinu gradient ﬂows: an overview. Bulletin of Mathematical Sciences 7(1), 87–154 (2017)

work page 2017

[32] [32]

Large-Scale Optimal Transport and Mapping Estimation

Seguy, V., Bhushan Damodaran, B., Flamary, R., Courty, N., Rolet, A., Blondel, M.: Large-scale optimal transport and mapping estimation. arXiv preprint arXiv:1711.02283 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[33] [33]

ACM Transactions on Graphics (TOG) 34(4), 66 (2015)

Solomon, J., De Goes, F., Peyr´ e, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional wasserstein distances: Eﬃcient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG) 34(4), 66 (2015)

work page 2015

[34] [34]

A Series of comprehensive Studies in Mathematics (2008)

Villani, C.: Optimal transport, old and new. A Series of comprehensive Studies in Mathematics (2008)

work page 2008

[35] [35]

A Fast Proximal Point Method for Computing Exact Wasserstein Distance

Xie, Y., Wang, X., Wang, R., Zha, H.: A fast proximal point method for wasserstein distance. arXiv preprint arXiv:1802.04307 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

In: ICLR (2019)

Yamaguchi, S., Koyama, M.: Distributional concavity regularization for gans. In: ICLR (2019)

work page 2019

[37] [37]

Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017) Adversarial Computation of Optimal Transport Maps 17 A Appendix for Adversarial Computation of Optimal Transport Maps A.1 Remaining questions and future work In the following we provide some f...

work page arXiv 2017

[38] [38]

Gθt`1 “ Ht,t`1˝ Gθt where Ht,t`1 solves the Monge problem between µθt and µθt`1

work page

[39] [39]

From Proposition 8 Ht,t`1 exists and is written Ht,t`1“p 1´ αqI` αTt where Tt is the Monge map between µθt and Px

Denoting Tt,t`k the unique Monge map between µθt and µθt`k, we have Tt,t`k“ Ht`k´1,t`k˝ ...˝ Ht,t`1 Proof. From Proposition 8 Ht,t`1 exists and is written Ht,t`1“p 1´ αqI` αTt where Tt is the Monge map between µθt and Px. From the fact that Tt“ I´ ∇φt, Ht,t`1 “ I´ α∇φt. Hence Ht,t`1 remains the gradient of a strictly convex function on Rm and thus is a Mo...

work page

[40] [40]

From Proposition 13, W2pµθt`1 , µθt`1,φqď ϵ1 ?

work page

[41] [41]

transports

We can thus conclude by triangle inequality. A.4 Results in optimal transport theory We develop a bit more the materials of the background section, introducing the same notions with more details in the same order. Monge Problem Optimal transport (OT) theory [34, 2] introduces a natural quantity to distinguish two probability measures. Given two probabilit...

work page

[42] [42]

We note that in the high dimen- sional setting, better training stability and image quality is achieved by using both Leq and Lϵ which complement Lineq in enforcing the constraint

for both φ and ϵ in the discriminator. We note that in the high dimen- sional setting, better training stability and image quality is achieved by using both Leq and Lϵ which complement Lineq in enforcing the constraint. We set λineq“ λeq“ λϵ“ 10 and use the ADAM optimizer with learning rate “ 0.0001 and β1“ 0.5, β2“ 0.999 for both the generator and the di...

work page 2007