Local Bures-Wasserstein Transport: A Practical and Fast Mapping Approximation

Andr\'es Hoyos-Idrobo

arxiv: 1906.08227 · v1 · pith:CNVLUA63new · submitted 2019-06-19 · 📊 stat.ML · cs.LG· stat.AP

Local Bures-Wasserstein Transport: A Practical and Fast Mapping Approximation

Andr\'es Hoyos-Idrobo This is my paper

Pith reviewed 2026-05-25 20:08 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.AP

keywords optimal transporttransport mapWasserstein barycenterGaussian mixture modelBures-Wassersteinapproximationmachine learning

0 comments

The pith

Matching Gaussian components and applying closed-form Bures-Wasserstein maps between pairs produces a fast approximate optimal transport map that generalizes to new points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes to approximate the optimal transport map by fitting Gaussian mixture models to source and target distributions, matching their components, and transporting each pair with the analytic Bures-Wasserstein formula. This construction sidesteps the scaling problems of kernel-based map learners when sample counts grow large. The resulting map evaluates on unseen points and also supplies a parametric form for the Wasserstein barycenter. Experiments on simulated and real data indicate an 80-fold reduction in running time together with recovery of the barycenter support using fewer components than prior methods.

Core claim

By decomposing each density into a Gaussian mixture, matching components across the pair of mixtures, and applying the closed-form Bures-Wasserstein transport to every matched pair, one obtains an approximate global transport map that generalizes out of sample and serves as a practical approximation to the Wasserstein barycenter.

What carries the argument

Local Bures-Wasserstein transport between matched pairs of Gaussian components drawn from the mixture models of the two densities.

If this is right

Overall running time drops by a factor of roughly 80 compared with kernel-based map estimation methods.
Fewer mixture components suffice to recover the support of the Wasserstein barycenter.
The learned map applies immediately to out-of-sample points without retraining.
The procedure integrates directly into standard machine-learning pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local-matching idea could be tried with other families that admit closed-form transport maps.
The speedup may open optimal-transport computations to online or streaming settings where kernel methods become prohibitive.
Because the map is parametric, it could be combined with density-estimation routines that already use mixture models.

Load-bearing premise

That matching Gaussian components across the two densities and transporting each matched pair independently with the Bures-Wasserstein map produces a sufficiently accurate global transport.

What would settle it

A direct comparison on two non-Gaussian mixture distributions where the component-matched map produces large discrepancies in transported mass relative to a ground-truth optimal map computed at high cost.

Figures

Figures reproduced from arXiv: 1906.08227 by Andr\'es Hoyos-Idrobo.

**Figure 1.** Figure 1: Illustration of the working principle of the approximation of the barycenter: For each group g (e.g., heart and clover), the algorithm receives a data matrix Xg . It computes a GMM for each group. It uses the means of these GMMs as an approximation of the geometry. Next, it matches these means.Finally, it computes the Bures-Wasserstein barycenter for each matched pair. Practical computation of the barycen… view at source ↗

**Figure 2.** Figure 2: Performance evaluation. The vertical dashed line denotes k = 20. L-BW requires fewer atoms to recover the support, and it is the fastest method. 5.1 Shape interpolation of clouds of points A straightforward application of Wasserstein barycenter is the interpolation of shapes, as it relies on performing optimal transportation over geometric domains. We vary the number of atoms k ∈ {2, . . . , 100} and evalu… view at source ↗

**Figure 3.** Figure 3: Visualization of barycenters: The coordinates represent the Wasserstein simplex, which consists in all of their Wasserstein barycenters under varying weights λ ∈ ∆3 . Fig.3a displays the barycenters obtained by [27] after thresholding using Otsu’s method. Computation time. We measured the running time of various approaches, including both the training time and the transport time. Fig.2b gives computation t… view at source ↗

**Figure 4.** Figure 4: Computation time taken to reach a solution: Quality of the fit of a `2 penalized logistic regression as a function of the computation time for a fixed number of atoms. (top) Predictive accuracy and (bottom) fairness score on heldout-data. In both datasets, L-BW obtains the best fairness score, and the same prediction score than ME-Gaussian with less computation time to reach a stable solution. The tim… view at source ↗

**Figure 5.** Figure 5: Impact of mixing distributions with L-BW on performance of various classifiers: The performance of classifiers after mixing sensitive groups relative to mean scores of the raw classifier (per dataset). The fairness score, the lower, the better (< 0). The AUC, the higher, the better (> 0). L-BW improves the fairness of logistic regression while maintaining prediction accuracy. For Gradient Bosting, L-BW r… view at source ↗

**Figure 7.** Figure 7: Setting the number of components: The Akaike information criterion (AIC) as a function of the number of atoms for each silhouette. k = 20 achieves the minimum AIC for all shapes. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of barycenters for different methods: The coordinates represent the Wasserstein simplex, which consists in all of their Wasserstein barycenters under varying weights λ ∈ ∆3 . (top) row corresponds to k = 10; (bottom) row corresponds to k = 100. 7.2 Statistical-parity fairness experiment 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate 0.0 0.2 0.4 0.6 0.8 1.0 True Positive Rate DP [PITH_FULL_IMA… view at source ↗

**Figure 10.** Figure 10 [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 9.** Figure 9: Support recovery on simulated data: Evaluation of the performance varying the number k of atoms, for barycenters obtained with different weights λ ∈ ∆3 . Coordinates in the Wasserstein simplex represent shapes: cat (λ = [1, 0, 0]), rabbit (λ = [0, 1, 0]), and tooth (λ = [0, 0, 1]). The horizontal dashed line denotes 85% of the agreement. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 11.** Figure 11: Visualization of the first two principal components for different datasets: We can see that for most of the datasets the groups are well separated. Training a classifier on top of this representation will lead to leak of information about the protected groups. Analysis (LDA), Gradient Boosting Trees (100 trees), Random Forest (100 trees), and Multi Layer Perceptron (MLP, 100 hidden units and ReLu as activ… view at source ↗

**Figure 12.** Figure 12: Setting the number of components: The AIC as a function the number of Gaussian components for all datasets and sensitive conditions. For most of the datasets, the optimal number of components is different for each sensitive group. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: Impact of mixing distributions with L-BW on performance of various conventional classifiers: The performance of classifiers after mixing groups relative to mean scores of the raw classifier. The fairness score, the lower, the better (< 0). The AUC, the higher, the better (> 0). Mixing groups improves the fairness for most methods while preserving the predictive accuracy. 18 [PITH_FULL_IMAGE:figures/full_… view at source ↗

read the original abstract

Optimal transport (OT)-based methods have a wide range of applications and have attracted a tremendous amount of attention in recent years. However, most of the computational approaches of OT do not learn the underlying transport map. Although some algorithms have been proposed to learn this map, they rely on kernel-based methods, which makes them prohibitively slow when the number of samples increases. Here, we propose a way to learn an approximate transport map and a parametric approximation of the Wasserstein barycenter. We build an approximated transport mapping by leveraging the closed-form of Gaussian (Bures-Wasserstein) transport; we compute local transport plans between matched pairs of the Gaussian components of each density. The learned map generalizes to out-of-sample examples. We provide experimental results on simulated and real data, comparing our proposed method with other mapping estimation algorithms. Preliminary experiments suggest that our proposed method is not only faster, with a factor 80 overall running time, but it also requires fewer components than state-of-the-art methods to recover the support of the barycenter. From a practical standpoint, it is straightforward to implement and can be used with a conventional machine learning pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a straightforward local Bures-Wasserstein map on matched Gaussian mixture components that is faster than kernel methods but rests on thin preliminary experiments.

read the letter

The core contribution is an algorithmic recipe: fit Gaussian mixtures to source and target, match components, and stitch together the closed-form Bures-Wasserstein maps between pairs. This produces an out-of-sample map and a parametric barycenter approximation. The construction itself is not a deep theoretical advance, but the specific choice to localize the Bures-Wasserstein plan this way for map learning is not in the cited prior work and is easy to code into an existing pipeline. That practicality is the main strength. The authors report an 80x runtime reduction and fewer components needed on their examples, which would matter for anyone running OT inside larger models where kernel methods become prohibitive. The method is internally consistent for what it claims—an approximation, not an exact solver—and it avoids the usual kernel scaling issues. The main weakness is the evidence. The abstract mentions simulated and real data comparisons, yet supplies no tables, error bars, dataset sizes, or baseline details. Without those, it is difficult to tell whether the speed and component-count advantages survive different mixture fits, matching heuristics, or harder distributions. The implicit assumption that component-wise maps compose into a reliable global map is plausible but untested in the provided summary, and no error bounds or sensitivity analysis appear. This paper is aimed at people already working on computational OT for machine learning who need a drop-in, faster alternative rather than new theory. A practitioner could try the idea quickly and see if it helps on their data. It is coherent enough and addresses a real bottleneck, so it deserves a serious referee even if the experiments will need strengthening.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Local Bures-Wasserstein Transport, an approximation to the optimal transport map obtained by fitting Gaussian mixture models to source and target measures, matching their components, and composing the closed-form Bures-Wasserstein maps between each matched pair. The resulting parametric map is claimed to generalize to out-of-sample points and to yield a practical approximation to the Wasserstein barycenter that requires fewer components than competing methods. Preliminary experiments on simulated and real data are said to demonstrate an 80-fold reduction in overall running time together with improved component efficiency.

Significance. If the reported speed and component-count advantages are confirmed by properly documented experiments, the method supplies a lightweight, pipeline-friendly alternative to kernel-based transport-map learners that scales to larger sample regimes while remaining straightforward to implement.

major comments (3)

[Experiments] Experiments section: the abstract asserts an overall 80× running-time reduction and fewer components for barycenter support recovery, yet no tables, timing breakdowns, component counts, error bars, or dataset specifications appear; without these quantitative results the central empirical claim cannot be evaluated.
[Method] Method (component-matching paragraph): the procedure used to pair Gaussian components across the two fitted mixtures is stated only at a high level; the cost function, assignment algorithm, and handling of unequal numbers of components are not specified, rendering the local-transport construction non-reproducible and its accuracy unassessable.
[Experiments] Out-of-sample generalization claim: the paper states that the learned map generalizes, but no quantitative hold-out error, comparison against in-sample performance, or baseline map learners on unseen points is reported, which is load-bearing for the practical utility asserted in the abstract.

minor comments (2)

[Method] Notation for the component-wise transport map is introduced without an explicit equation number; adding a displayed equation would improve clarity.
[Introduction] The abstract mentions “state-of-the-art methods” for comparison but does not name them; a brief list in the introduction would help readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address each major comment below and commit to a revised manuscript that incorporates the requested clarifications and additional results.

read point-by-point responses

Referee: [Experiments] Experiments section: the abstract asserts an overall 80× running-time reduction and fewer components for barycenter support recovery, yet no tables, timing breakdowns, component counts, error bars, or dataset specifications appear; without these quantitative results the central empirical claim cannot be evaluated.

Authors: We agree that the current manuscript presents only preliminary experimental results without the detailed quantitative documentation needed to substantiate the claims. In the revision we will add tables reporting timing breakdowns, component counts, error bars, and full dataset specifications to allow proper evaluation of the reported speed and efficiency advantages. revision: yes
Referee: [Method] Method (component-matching paragraph): the procedure used to pair Gaussian components across the two fitted mixtures is stated only at a high level; the cost function, assignment algorithm, and handling of unequal numbers of components are not specified, rendering the local-transport construction non-reproducible and its accuracy unassessable.

Authors: The referee correctly identifies that the component-matching step is described at an insufficient level of detail. We will expand the relevant paragraph to specify the cost function (Bures-Wasserstein distance between fitted Gaussians), the assignment procedure (Hungarian algorithm), and the rule for unequal component counts (e.g., a greedy matching with a distance threshold for unmatched components). revision: yes
Referee: [Experiments] Out-of-sample generalization claim: the paper states that the learned map generalizes, but no quantitative hold-out error, comparison against in-sample performance, or baseline map learners on unseen points is reported, which is load-bearing for the practical utility asserted in the abstract.

Authors: We acknowledge that the manuscript currently asserts out-of-sample generalization without supporting quantitative evidence. The revised version will include dedicated hold-out experiments that report generalization error, in-sample versus out-of-sample comparisons, and direct comparisons against baseline map learners on unseen points. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper constructs an approximate transport map by matching Gaussian mixture components and applying the known closed-form Bures-Wasserstein map between Gaussians. This closed-form is external mathematical knowledge, not derived or fitted within the paper. The central claims are empirical (speed and component count on tested data) rather than a parameter-free derivation that reduces to its own inputs. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citation chains appear in the provided abstract or description. The method is internally consistent for its stated goal of a practical approximation.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method relies on the existence of a closed-form Bures-Wasserstein map between Gaussians (standard result) and on the ability to match mixture components across densities (domain modeling choice). No new entities are postulated. Number of mixture components is a free modeling choice whose selection affects the reported speed and accuracy.

free parameters (1)

number of Gaussian components
Chosen to recover barycenter support; the abstract states the method requires fewer components than baselines, implying this hyper-parameter is tuned or selected per experiment.

axioms (2)

standard math Bures-Wasserstein distance admits a closed-form optimal transport map between two Gaussians
Invoked to justify the local transport plans; this is a known result from prior literature on Gaussian OT.
domain assumption Gaussian mixture models can be matched component-wise to approximate the global transport
Core modeling step of the proposed local map; not derived in the abstract.

pith-pipeline@v0.9.0 · 5734 in / 1505 out tokens · 22410 ms · 2026-05-25T20:08:47.452363+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

[1]

Wasserstein GAN

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan.arXiv preprint arXiv:1701.07875, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

F. R. Bach and M. I. Jordan. Predictive low-rank decomposition for kernel methods. InICML. ACM, 2005

work page 2005
[3]

Bhatia, T

R. Bhatia, T. Jain, and Y. Lim. On the bures–wasserstein distance between positive deﬁnite matrices. Expositiones Mathematicae, 2018

work page 2018
[4]

C. Blake. Uci repository of machine learning databases. http://www. ics. uci. edu/˜ mlearn/MLRepository. html, 1998

work page 1998
[5]

Bonneel, M

N. Bonneel, M. Van De Panne, S. Paris, and W. Heidrich. Displacement interpolation using lagrangian mass transport. InACM Transactions on Graphics (TOG), volume 30. ACM, 2011

work page 2011
[6]

R. E. Burkard, M. Dell’Amico, and S. Martello.Assignment problems. Springer, 2009

work page 2009
[7]

Courty, R

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy. Optimal transport for domain adaptation. IEEE TPAMI, 39, 2017

work page 2017
[8]

M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InNeurIPS, 2013

work page 2013
[9]

Cuturi and A

M. Cuturi and A. Doucet. Fast computation of wasserstein barycenters. InICML, 2014

work page 2014
[10]

Obtaining fairness using optimal transport theory

E. del Barrio, F. Gamboa, P. Gordaliza, and J.-M. Loubes. Obtaining fairness using optimal transport theory.arXiv preprint arXiv:1806.03195, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

Flamary and N

R. Flamary and N. Courty. Pot python optimal transport library, 2017. URLhttps://github. com/rflamary/POT

work page 2017
[12]

Flamary, M

R. Flamary, M. Cuturi, N. Courty, and A. Rakotomamonjy. Wasserstein discriminant analysis. Machine Learning, 107, 2018

work page 2018
[13]

Friedman, T

J. Friedman, T. Hastie, and R. Tibshirani.The elements of statistical learning, volume 1. Springer series in statistics New York, 2001. 11

work page 2001
[14]

Genevay, G

A. Genevay, G. Peyre, and M. Cuturi. Learning generative models with sinkhorn divergences. In AISTATS, 2018

work page 2018
[15]

C. R. Givens, R. M. Shortt, et al. A class of wasserstein metrics for probability distributions. The Michigan Mathematical Journal, 31, 1984

work page 1984
[16]

Jones, T

E. Jones, T. Oliphant, and P. Peterson.{SciPy}: Open source scientiﬁc tools for{Python}. 2014

work page 2014
[17]

Larson, S

J. Larson, S. Mattu, L. Kirchner, and J. Angwin. How we analyzed the compas recidivism algorithm. ProPublica (5 2016), 9, 2016

work page 2016
[18]

Masarotto, V

V. Masarotto, V. M. Panaretos, and Y. Zemel. Procrustes metrics on covariance operators and optimal transportation of gaussian processes.Sankhya A, 2018

work page 2018
[19]

J. Munkres. Algorithms for the assignment and transportation problems.Journal of the society for industrial and applied mathematics, 5, 1957

work page 1957
[20]

Olfat and A

M. Olfat and A. Aswani. Spectral algorithms for computing fair support vector machines. In AISTATS, 2018

work page 2018
[21]

N. Otsu. A threshold selection method from gray-level histograms.IEEE transactions on systems, man, and cybernetics, 9, 1979

work page 1979
[22]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.JMLR, 12, 2011

work page 2011
[23]

Perrot, N

M. Perrot, N. Courty, R. Flamary, and A. Habrard. Mapping estimation for discrete optimal transport. In NeurIPS, 2016

work page 2016
[24]

Peyré, M

G. Peyré, M. Cuturi, et al. Computational optimal transport.Foundations and TrendsR⃝ in Machine Learning, 11, 2019

work page 2019
[25]

Rabin, G

J. Rabin, G. Peyré, J. Delon, and M. Bernot. Wasserstein barycenter and its application to texture mixing. InInternational Conference on Scale Space and Variational Methods in Computer Vision. Springer, 2011

work page 2011
[26]

M. A. Schmitz, M. Heitz, N. Bonneel, F. Ngole, D. Coeurjolly, M. Cuturi, G. Peyré, and J.-L. Starck. Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning. SIAM Journal on Imaging Sciences, 11, 2018

work page 2018
[27]

Solomon, F

J. Solomon, F. De Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas. Convolutional wasserstein distances: Eﬃcient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG), 34, 2015

work page 2015
[28]

Takatsu et al

A. Takatsu et al. Wasserstein geometry of gaussian measures.Osaka Journal of Mathematics, 48, 2011

work page 2011
[29]

Tolstikhin, O

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf. Wasserstein auto-encoders.arXiv preprint arXiv:1711.01558, 2017

work page arXiv 2017
[30]

C. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008

work page 2008
[31]

J. Ye, P. Wu, J. Z. Wang, and J. Li. Fast discrete distribution clustering using wasserstein barycenter with sparse support.IEEE Transactions on Signal Processing, 65

work page
[32]

M. B. Zafar, I. Valera, M. G. Rogriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classiﬁcation. InAISTATS, 2017. 12

work page 2017
[33]

Zaharia, R

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, et al. Apache spark: a uniﬁed engine for big data processing. Communications of the ACM, 59, 2016. 7 Applications: Additional analysis 7.1 Shape interpolation of a cloud of points experiment Figure 6:Input shapes: Cat, rabbit, and tooth. Ea...

work page 2016

[1] [1]

Wasserstein GAN

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan.arXiv preprint arXiv:1701.07875, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[2] [2]

F. R. Bach and M. I. Jordan. Predictive low-rank decomposition for kernel methods. InICML. ACM, 2005

work page 2005

[3] [3]

Bhatia, T

R. Bhatia, T. Jain, and Y. Lim. On the bures–wasserstein distance between positive deﬁnite matrices. Expositiones Mathematicae, 2018

work page 2018

[4] [4]

C. Blake. Uci repository of machine learning databases. http://www. ics. uci. edu/˜ mlearn/MLRepository. html, 1998

work page 1998

[5] [5]

Bonneel, M

N. Bonneel, M. Van De Panne, S. Paris, and W. Heidrich. Displacement interpolation using lagrangian mass transport. InACM Transactions on Graphics (TOG), volume 30. ACM, 2011

work page 2011

[6] [6]

R. E. Burkard, M. Dell’Amico, and S. Martello.Assignment problems. Springer, 2009

work page 2009

[7] [7]

Courty, R

N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy. Optimal transport for domain adaptation. IEEE TPAMI, 39, 2017

work page 2017

[8] [8]

M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InNeurIPS, 2013

work page 2013

[9] [9]

Cuturi and A

M. Cuturi and A. Doucet. Fast computation of wasserstein barycenters. InICML, 2014

work page 2014

[10] [10]

Obtaining fairness using optimal transport theory

E. del Barrio, F. Gamboa, P. Gordaliza, and J.-M. Loubes. Obtaining fairness using optimal transport theory.arXiv preprint arXiv:1806.03195, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

Flamary and N

R. Flamary and N. Courty. Pot python optimal transport library, 2017. URLhttps://github. com/rflamary/POT

work page 2017

[12] [12]

Flamary, M

R. Flamary, M. Cuturi, N. Courty, and A. Rakotomamonjy. Wasserstein discriminant analysis. Machine Learning, 107, 2018

work page 2018

[13] [13]

Friedman, T

J. Friedman, T. Hastie, and R. Tibshirani.The elements of statistical learning, volume 1. Springer series in statistics New York, 2001. 11

work page 2001

[14] [14]

Genevay, G

A. Genevay, G. Peyre, and M. Cuturi. Learning generative models with sinkhorn divergences. In AISTATS, 2018

work page 2018

[15] [15]

C. R. Givens, R. M. Shortt, et al. A class of wasserstein metrics for probability distributions. The Michigan Mathematical Journal, 31, 1984

work page 1984

[16] [16]

Jones, T

E. Jones, T. Oliphant, and P. Peterson.{SciPy}: Open source scientiﬁc tools for{Python}. 2014

work page 2014

[17] [17]

Larson, S

J. Larson, S. Mattu, L. Kirchner, and J. Angwin. How we analyzed the compas recidivism algorithm. ProPublica (5 2016), 9, 2016

work page 2016

[18] [18]

Masarotto, V

V. Masarotto, V. M. Panaretos, and Y. Zemel. Procrustes metrics on covariance operators and optimal transportation of gaussian processes.Sankhya A, 2018

work page 2018

[19] [19]

J. Munkres. Algorithms for the assignment and transportation problems.Journal of the society for industrial and applied mathematics, 5, 1957

work page 1957

[20] [20]

Olfat and A

M. Olfat and A. Aswani. Spectral algorithms for computing fair support vector machines. In AISTATS, 2018

work page 2018

[21] [21]

N. Otsu. A threshold selection method from gray-level histograms.IEEE transactions on systems, man, and cybernetics, 9, 1979

work page 1979

[22] [22]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.JMLR, 12, 2011

work page 2011

[23] [23]

Perrot, N

M. Perrot, N. Courty, R. Flamary, and A. Habrard. Mapping estimation for discrete optimal transport. In NeurIPS, 2016

work page 2016

[24] [24]

Peyré, M

G. Peyré, M. Cuturi, et al. Computational optimal transport.Foundations and TrendsR⃝ in Machine Learning, 11, 2019

work page 2019

[25] [25]

Rabin, G

J. Rabin, G. Peyré, J. Delon, and M. Bernot. Wasserstein barycenter and its application to texture mixing. InInternational Conference on Scale Space and Variational Methods in Computer Vision. Springer, 2011

work page 2011

[26] [26]

M. A. Schmitz, M. Heitz, N. Bonneel, F. Ngole, D. Coeurjolly, M. Cuturi, G. Peyré, and J.-L. Starck. Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning. SIAM Journal on Imaging Sciences, 11, 2018

work page 2018

[27] [27]

Solomon, F

J. Solomon, F. De Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas. Convolutional wasserstein distances: Eﬃcient optimal transportation on geometric domains. ACM Transactions on Graphics (TOG), 34, 2015

work page 2015

[28] [28]

Takatsu et al

A. Takatsu et al. Wasserstein geometry of gaussian measures.Osaka Journal of Mathematics, 48, 2011

work page 2011

[29] [29]

Tolstikhin, O

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf. Wasserstein auto-encoders.arXiv preprint arXiv:1711.01558, 2017

work page arXiv 2017

[30] [30]

C. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008

work page 2008

[31] [31]

J. Ye, P. Wu, J. Z. Wang, and J. Li. Fast discrete distribution clustering using wasserstein barycenter with sparse support.IEEE Transactions on Signal Processing, 65

work page

[32] [32]

M. B. Zafar, I. Valera, M. G. Rogriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classiﬁcation. InAISTATS, 2017. 12

work page 2017

[33] [33]

Zaharia, R

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, et al. Apache spark: a uniﬁed engine for big data processing. Communications of the ACM, 59, 2016. 7 Applications: Additional analysis 7.1 Shape interpolation of a cloud of points experiment Figure 6:Input shapes: Cat, rabbit, and tooth. Ea...

work page 2016