Streaming Sliced Optimal Transport

Khai Nguyen

arxiv: 2505.06835 · v4 · submitted 2025-05-11 · 💻 cs.LG · stat.CO· stat.ME· stat.ML

Streaming Sliced Optimal Transport

Khai Nguyen This is my paper

Pith reviewed 2026-05-22 15:42 UTC · model grok-4.3

classification 💻 cs.LG stat.COstat.MEstat.ML

keywords streaming datasliced Wasserstein distancequantile approximationoptimal transportstreaming estimationpoint cloud classificationchange point detectionGaussian mixtures

0 comments

The pith

Stream-SW estimates sliced Wasserstein distances from data streams using quantile approximations on projections while keeping memory constant and error bounded.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first streaming estimator for the sliced Wasserstein distance, called Stream-SW, that processes samples arriving one by one without storing the entire history. It builds this estimator by replacing the usual one-dimensional Wasserstein distance on each projection with a streaming quantile-based version that updates incrementally. Because the method never retains more than a fixed number of summary statistics, its memory footprint stays independent of stream length. The authors prove that the approximation error remains controlled under standard assumptions on the quantile estimators and show empirically that the resulting distances are closer to the true sliced Wasserstein values than distances obtained by subsampling the same stream, at least for Gaussian and Gaussian-mixture data.

Core claim

By applying streaming quantile approximation techniques to the one-dimensional Wasserstein distance on each random projection, Stream-SW produces a distance that converges to the sliced Wasserstein distance with provable error bounds and constant memory, outperforming random subsampling on streaming samples drawn from Gaussians and mixtures of Gaussians.

What carries the argument

Streaming estimator of the one-dimensional Wasserstein distance obtained by replacing the quantile functions with their streaming approximations and integrating the absolute difference of those approximations.

If this is right

Stream-SW can be plugged directly into existing sliced-Wasserstein pipelines for point-cloud classification without requiring the full point cloud to reside in memory.
Gradient flows that rely on sliced Wasserstein distances can now run on continuously arriving point clouds.
Change-point detection algorithms that monitor sliced distances become feasible in unbounded data streams.
The constant-memory property allows the same estimator to be reused across an arbitrary number of successive distribution comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same streaming quantile machinery could be adapted to other one-dimensional projections or to other integral-probability metrics that admit closed-form expressions on the line.
In online learning settings where distributions drift, Stream-SW offers a lightweight way to track distributional distance without periodic full recomputation.
Because the method decouples memory from stream length, it may enable sliced-transport computations on edge devices that receive continuous sensor data.

Load-bearing premise

The streaming quantile approximation techniques produce estimates of the one-dimensional Wasserstein distance on each projection that are accurate enough for the overall sliced distance error to remain controlled.

What would settle it

Compare the Stream-SW value obtained from a long stream of samples from two known Gaussians against the analytically computable sliced Wasserstein distance between those same Gaussians; if the observed error exceeds the derived bound for the chosen quantile precision, the guarantee fails.

Figures

Figures reproduced from arXiv: 2505.06835 by Khai Nguyen.

**Figure 2.** Figure 2: The figure shows the computational procedure of Stream-SW. In the figure, we denotes Wfp p (µn, νm; Sµn,k1 , Sνm,k2 ) as Wfp p (µn, νm) and SWdg p p(µn, νm; k1, k2, L) as SWdg p p(µn, νm;L). The proof of Proposition 3 is given in Appendix A.3. When min{k1, k2, √ n, √ m} = O((1/ϵ) p log(4/δ)) and min{k1, k2, √ n, √ m} = O((1/ϵ) p log(8/δ)), we have ϵ error with failure probability δ for the empricial approx… view at source ↗

**Figure 3.** Figure 3: Relative errors and number of samples when comparing mixtures of Gaussian distributions. where σ ∈ P(S d−1 ) is a slicing distribution on the unit sphere and Wfp p (θ♯µn, θ♯νm; Sθ♯µn,k1 , Sθ♯νm,k2 ) is defined in Definition 1. From the failure probability of streaming quantile estimator in Proposition 3, we obtain similar result for Stream-SW. Corollary 1. Let µ, ν ∈ Pp(R d ) be probability measures suppor… view at source ↗

**Figure 4.** Figure 4: Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Relative errors and number of samples when comparing Gaussian distributions Definition 4. Let k1 > 1, and p ≥ 1. The one-sided streaming Sliced Wasserstein (Stream-SW) distance between two empirical distributions µn ∈ Pp(R d ) and νm ∈ Pp(R d ), whose supports are observed in a streaming manner, is defined by SWgp p (µn, νm; k1) = Eθ∼σ h Wfp p [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L = 100, k = 100) [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

**Figure 7.** Figure 7: Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L = 100, k = 200) [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L = 1000, k = 100). the quantitative result in [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: The figure shows the decision statistics, decision thresholds (null statistics), the true time index (where the change in action happens), and the detected indices, for both two approaches i.e., SW (Sliding window) and Stream-SW for the four guests. flows in both Wasserstein-2 and perceptual visualization. Streaming change point detection. The conventional approach is to use a sliding window to detect sudd… view at source ↗

read the original abstract

Sliced optimal transport (SOT), or sliced Wasserstein (SW) distance, is widely recognized for its statistical and computational scalability. In this work, we further enhance computational scalability by proposing the first method for estimating SW from sample streams, called streaming sliced Wasserstein (Stream-SW). To define Stream-SW, we first introduce a streaming estimator of the one-dimensional Wasserstein distance (1DW). Since the 1DW has a closed-form expression, given by the integral of the absolute difference between the quantile functions of the compared distributions, we leverage quantile approximation techniques for sample streams to define a streaming 1DW estimator. By applying the streaming 1DW to all projections, we obtain Stream-SW. The key advantage of Stream-SW is its low memory complexity while providing theoretical guarantees on the approximation error. We demonstrate that Stream-SW achieves a more accurate approximation of SW than random subsampling, with lower memory consumption, when comparing Gaussian distributions and mixtures of Gaussians from streaming samples. Additionally, we conduct experiments on point cloud classification, point cloud gradient flows, and streaming change point detection to further highlight the favorable performance of the proposed Stream-SW.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Stream-SW gives a workable low-memory estimator for sliced Wasserstein on data streams by pairing quantile sketches with projections, but the error bounds look like they need tighter checking on how per-projection approximation errors add up.

read the letter

The main takeaway is that this paper delivers the first explicit streaming estimator for sliced Wasserstein distance. It builds a streaming one-dimensional Wasserstein estimator from quantile approximation techniques on sample streams, then applies it across random projections to get the sliced version. That combination is new relative to prior sliced OT work, and the low memory complexity is the practical hook for continuous data settings.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Stream-SW as the first streaming estimator for sliced Wasserstein distance. It defines a streaming one-dimensional Wasserstein estimator by applying quantile approximation techniques to sample streams, then aggregates these over random projections. The work claims low memory complexity, theoretical guarantees on approximation error, and better accuracy than random subsampling when comparing streaming samples from Gaussians and Gaussian mixtures, with further experiments on point-cloud classification, gradient flows, and change-point detection.

Significance. If the error propagation from per-projection quantile sketches is rigorously controlled and the empirical advantages persist under matched memory budgets, the method could enable practical use of sliced OT distances in memory-constrained streaming regimes. The reliance on existing quantile-sketch machinery is a direct and potentially useful extension.

major comments (3)

[Abstract] Abstract: the claim of 'theoretical guarantees on the approximation error' is not accompanied by an explicit bound or derivation; the propagation of per-projection streaming-quantile error (typically O(1/sqrt(m)) for fixed sketch size m) through the Monte-Carlo average over the sphere must be shown to remain controlled as stream length grows, especially for piecewise-linear quantile functions arising from mixtures.
[§3] §3 (definition of streaming 1DW estimator): the translation from quantile-sketch approximation error to the L1 integral that defines 1DW is not detailed; without this step it is unclear whether the per-projection error remains bounded independently of the number of projections used in Stream-SW.
[Experiments] Experimental section (Gaussian and mixture comparisons): the superiority claim versus random subsampling requires explicit reporting of sketch size, number of projections, and exact memory footprint for both methods so that the comparison occurs under equivalent resource constraints.

minor comments (2)

[Notation] Notation for the streaming quantile function should be introduced once and used consistently to avoid confusion with the empirical quantile function.
[Discussion] Add a short discussion of how the method behaves when the underlying distributions have heavy tails or discontinuities in their quantile functions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major point below and have revised the manuscript accordingly to improve clarity, rigor, and experimental fairness.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'theoretical guarantees on the approximation error' is not accompanied by an explicit bound or derivation; the propagation of per-projection streaming-quantile error (typically O(1/sqrt(m)) for fixed sketch size m) through the Monte-Carlo average over the sphere must be shown to remain controlled as stream length grows, especially for piecewise-linear quantile functions arising from mixtures.

Authors: We agree that making the error bound explicit will strengthen the abstract and introduction. In the revised version we add a short derivation (now referenced from the abstract) showing that the total approximation error of Stream-SW decomposes into a Monte-Carlo term (O(1/sqrt(L))) plus the average per-projection streaming error. Because the quantile sketch size m is fixed, its uniform approximation error remains O(1/sqrt(m)) independently of stream length n; the L1 integral over the sphere therefore stays controlled for any fixed L. For Gaussian-mixture quantiles, which are piecewise linear, the same uniform bound applies because the sketch guarantees hold for any monotone quantile function. We will also add a brief remark on this point in the abstract. revision: yes
Referee: [§3] §3 (definition of streaming 1DW estimator): the translation from quantile-sketch approximation error to the L1 integral that defines 1DW is not detailed; without this step it is unclear whether the per-projection error remains bounded independently of the number of projections used in Stream-SW.

Authors: We will expand the proof of Theorem 1 in §3 to include the missing translation step. Let Q̂_m denote the sketched quantile function with sup-norm error ε_m. The 1DW distance is ∫|Q1(t)−Q2(t)|dt. By the triangle inequality the integral error is at most ε_m·(b−a), where [a,b] is the bounded support of the projected data (or the data diameter after centering). This per-projection bound depends only on m and the support, not on the number of projections L. The final Stream-SW error is then obtained by averaging L independent such terms, preserving the independence from L. The revised §3 will contain the explicit chain of inequalities. revision: yes
Referee: [Experiments] Experimental section (Gaussian and mixture comparisons): the superiority claim versus random subsampling requires explicit reporting of sketch size, number of projections, and exact memory footprint for both methods so that the comparison occurs under equivalent resource constraints.

Authors: We accept this criticism and will add a new table (Table 1 in the revised experimental section) that lists, for every reported experiment: (i) sketch size m, (ii) number of projections L, (iii) total memory in number of stored scalars for Stream-SW, and (iv) the exact subsample size used for the random-subsampling baseline chosen so that its memory footprint matches that of Stream-SW. All Gaussian and mixture results will be recomputed and replotted under these matched budgets; the text will explicitly state that the comparisons are now resource-equivalent. revision: yes

Circularity Check

0 steps flagged

No circularity: Stream-SW defined by direct application of existing quantile approximations

full rationale

The derivation chain begins with the closed-form 1DW integral over quantile functions, then substitutes streaming quantile approximation techniques (external to the paper) to obtain a streaming 1DW estimator, and finally averages over random projections to produce Stream-SW. No equation reduces to a fitted parameter renamed as a prediction, no self-citation is invoked as a uniqueness theorem or load-bearing premise, and the error guarantees are stated to follow from the properties of the imported quantile sketches rather than from any self-referential construction. The central claim therefore remains independent of its own outputs and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the accuracy of quantile approximation techniques applied to streaming samples for each 1D projection; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Quantile approximation techniques for sample streams produce usable estimates of the one-dimensional Wasserstein distance
Invoked to define the streaming 1DW estimator before applying it across projections

pith-pipeline@v0.9.0 · 5729 in / 1152 out tokens · 27372 ms · 2026-05-22T15:42:56.904712+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We leverage quantile approximation techniques for sample streams to define a streaming 1DW estimator... space complexity of Stream-SW is logarithmic in the number of streaming samples
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the first streaming version of the SW distance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

[1]

Achlioptas, O

P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas. Learning representations and generative models for 3d point clouds. InInternational conference on machine learning, pages 40–49. PMLR, 2018.(Cited on page 1.)

work page 2018
[2]

Ambrogioni, U

L. Ambrogioni, U. Güçlü, Y. Güçlütürk, M. Hinne, M. A. van Gerven, and E. Maris. Wasserstein variational inference.Advances in Neural Information Processing Systems, 31, 2018.(Cited on page 1.)

work page 2018
[3]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223, 2017.(Cited on page 1.)

work page 2017
[4]

Bernton, P

E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. Approximate Bayesian computation with the Wasserstein distance.Journal of the Royal Statistical Society, 2019.(Cited on page 1.)

work page 2019
[5]

M. T. Boedihardjo. Sharp bounds for max-sliced Wasserstein distances.Foundations of Computational Mathematics, pages 1–32, 2025.(Cited on page 5.)

work page 2025
[6]

Bonet, P

C. Bonet, P. Berg, N. Courty, F. Septier, L. Drumetz, and M.-T. Pham. Spherical sliced- Wasserstein.International Conference on Learning Representations, 2023.(Cited on page 12.)

work page 2023
[7]

Bonet, N

C. Bonet, N. Courty, F. Septier, and L. Drumetz. Efficient gradient flows in sliced-Wasserstein space.Transactions on Machine Learning Research, 2022.(Cited on page 2.)

work page 2022
[8]

Bonet, L

C. Bonet, L. Drumetz, and N. Courty. Sliced-Wasserstein distances and flows on Cartan- Hadamard manifolds.Journal of Machine Learning Research, 26(32):1–76, 2025.(Cited on page 12.)

work page 2025
[9]

Bonneel, J

N. Bonneel, J. Rabin, G. Peyré, and H. Pfister. Sliced and Radon Wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 1(51):22–45, 2015.(Cited on pages 1, 4, and 9.)

work page 2015
[10]

Bonnotte.Unidimensional and evolution methods for optimal transportation

N. Bonnotte.Unidimensional and evolution methods for optimal transportation. PhD thesis, Paris 11, 2013.(Cited on page 4.)

work page 2013
[11]

A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint arXiv:1512.03012, 2015.(Cited on page 12.)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

Chen and H.-G

H. Chen and H.-G. Müller. Sliced Wasserstein regression.arXiv preprint arXiv:2306.10601, 2023.(Cited on page 2.) 13

work page arXiv 2023
[13]

M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, pages 2292–2300, 2013.(Cited on page 1.)

work page 2013
[14]

Deshpande, Y.-T

I. Deshpande, Y.-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. Forsyth, and A. G. Schwing. Max-sliced Wasserstein distance and its use for GANs. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10648–10656, 2019.(Cited on page 29.)

work page 2019
[15]

Deshpande, Z

I. Deshpande, Z. Zhang, and A. G. Schwing. Generative modeling using the sliced Wasserstein distance. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3483–3491, 2018.(Cited on page 2.)

work page 2018
[16]

Flamary, N

R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud, H. Janati, A. Rakotoma- monjy, I. Redko, A. Rolet, A. Schutz, V. Seguy, D. J. Sutherland, R. Tavenard, A. Tong, and T. Vayer. Pot: Python optimal transport.Journal of Machine Learning Research, 22(78):1–8...

work page 2021
[17]

Fothergill, H

S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin. Instructing people for training gestural interactive systems. InProceedings of the SIGCHI conference on human factors in computing systems, pages 1737–1746, 2012.(Cited on page 12.)

work page 2012
[18]

Fournier and A

N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probability theory and related fields, 162(3):707–738, 2015.(Cited on pages 1 and 4.)

work page 2015
[19]

Goldfeld, K

Z. Goldfeld, K. Kato, G. Rioux, and R. Sadhu. Statistical inference with regularized optimal transport.Information and Inference: A Journal of the IMA, 13(1):iaad056, 2024.(Cited on page 21.)

work page 2024
[20]

Greenwald and S

M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries.ACM SIGMOD Record, 30(2):58–66, 2001.(Cited on page 5.)

work page 2001
[21]

Helgason

S. Helgason. The Radon transform on r n. InIntegral Geometry and Radon Transforms, pages 1–62. Springer, 2011.(Cited on pages 1 and 4.)

work page 2011
[22]

Hu and Z

X. Hu and Z. Lin. Two-sample distribution tests in high dimensions via max-sliced wasserstein distance and bootstrapping.Biometrika, page asaf001, 2025.(Cited on page 2.)

work page 2025
[23]

Karnin, K

Z. Karnin, K. Lang, and E. Liberty. Optimal quantile approximation in streams. InIEEE Symposium on Foundations of Computer Science (FOCS), pages 71–78. IEEE, 2016.(Cited on pages 5 and 6.)

work page 2016
[24]

Kolouri, K

S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde. Generalized sliced Wasserstein distances. InAdvances in Neural Information Processing Systems, pages 261–272, 2019.(Cited on pages 12 and 29.)

work page 2019
[25]

Kolouri, G

S. Kolouri, G. K. Rohde, and H. Hoffmann. Sliced Wasserstein distance for learning Gaussian mixture models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3427–3436, 2018.(Cited on page 2.) 14

work page 2018
[26]

C.-Y. Lee, T. Batra, M. H. Baig, and D. Ulbricht. Sliced Wasserstein discrepancy for unsuper- vised domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10285–10295, 2019.(Cited on page 2.)

work page 2019
[27]

Leluc, A

R. Leluc, A. Dieuleveut, F. Portier, J. Segers, and A. Zhuman. Sliced-Wasserstein estimation with spherical harmonics as control variates. InProceedings of the 41st International Conference on Machine Learning, 2024.(Cited on page 4.)

work page 2024
[28]

T. Li, C. Meng, H. Xu, and J. Yu. Hilbert curve projection distance for distribution comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(7):4993–5007, 2024.(Cited on page 11.)

work page 2024
[29]

Lipman, R

Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023.(Cited on page 1.)

work page 2023
[30]

Liutkus, U

A. Liutkus, U. Simsekli, S. Majewski, A. Durmus, and F.-R. Stöter. Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. InInternational Conference on Machine Learning, pages 4104–4113. PMLR, 2019.(Cited on page 2.)

work page 2019
[31]

Luong, K

M. Luong, K. Nguyen, N. Ho, R. Haf, D. Phung, and L. Qu. Revisiting deep audio-text retrieval through the lens of transportation. InThe Twelfth International Conference on Learning Representations, 2024.(Cited on page 1.)

work page 2024
[32]

Manole, S

T. Manole, S. Balakrishnan, and L. Wasserman. Minimax confidence intervals for the sliced Wasserstein distance.Electronic Journal of Statistics, 16(1):2252–2345, 2022.(Cited on page 5.)

work page 2022
[33]

Mensch and G

A. Mensch and G. Peyré. Online sinkhorn: Optimal transport distances from sample streams. Advances in Neural Information Processing Systems, 33:1657–1667, 2020.(Cited on page 2.)

work page 2020
[34]

Nadjahi, V

K. Nadjahi, V. De Bortoli, A. Durmus, R. Badeau, and U. Şimşekli. Approximate Bayesian computation with the sliced-Wasserstein distance. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5470–5474. IEEE, 2020. (Cited on page 2.)

work page 2020
[35]

Nadjahi, A

K. Nadjahi, A. Durmus, L. Chizat, S. Kolouri, S. Shahrampour, and U. Simsekli. Statistical and topological properties of sliced probability divergences.Advances in Neural Information Processing Systems, 33:20802–20812, 2020.(Cited on page 5.)

work page 2020
[36]

K. Nguyen. An introduction to sliced optimal transport: Foundations, advances, extensions, and applications.Foundations and Trends®in Computer Graphics and Vision, 17(3-4):171–406, 2025.(Cited on pages 1 and 4.)

work page 2025
[37]

Nguyen, N

K. Nguyen, N. Bariletto, and N. Ho. Quasi-monte carlo for 3d sliced Wasserstein. InThe Twelfth International Conference on Learning Representations, 2024.(Cited on pages 4 and 9.)

work page 2024
[38]

Nguyen, N

K. Nguyen, N. Ho, T. Pham, and H. Bui. Distributional sliced-Wasserstein and applications to generative modeling. InInternational Conference on Learning Representations, 2021.(Cited on page 5.) 15

work page 2021
[39]

Nguyen and P

K. Nguyen and P. Mueller. Summarizing Bayesian nonparametric mixture posterior–sliced optimal transport metrics for Gaussian mixtures.arXiv preprint arXiv:2411.14674, 2024.(Cited on pages 2 and 12.)

work page arXiv 2024
[40]

Nguyen, Y

K. Nguyen, Y. Ni, and P. Mueller. Bayesian density-density regression with application to cell-cell communications.arXiv preprint arXiv:2504.12617, 2025.(Cited on page 2.)

work page arXiv 2025
[41]

Nietert, R

S. Nietert, R. Sadhu, Z. Goldfeld, and K. Kato. Statistical, robustness, and computational guarantees for sliced Wasserstein distances.Advances in Neural Information Processing Systems, 2022.(Cited on page 5.)

work page 2022
[42]

Pele and M

O. Pele and M. Werman. Fast and robust earth mover’s distances. In2009 IEEE 12th International Conference on Computer Vision, pages 460–467. IEEE, September 2009.(Cited on page 1.)

work page 2009
[43]

Peyré and M

G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019.(Cited on pages 1, 3, and 4.)

work page 2019
[44]

Pooladian, H

A.-A. Pooladian, H. Ben-Hamu, C. Domingo-Enrich, B. Amos, Y. Lipman, and R. T. Chen. Multisample flow matching: Straightening flows with minibatch couplings. InInternational Conference on Machine Learning, pages 28100–28127. PMLR, 2023.(Cited on page 1.)

work page 2023
[45]

Rabin, S

J. Rabin, S. Ferradans, and N. Papadakis. Adaptive color transfer with relaxed optimal transport. In2014 IEEE International Conference on Image Processing (ICIP), pages 4852–4856. IEEE, 2014.(Cited on page 4.)

work page 2014
[46]

Santambrogio

F. Santambrogio. Optimal transport for applied mathematicians.Birkäuser, NY, 55(58-63):94, 2015.(Cited on page 12.)

work page 2015
[47]

L. Shi, J. Fan, and J. Yan. Ot-clip: Understanding and generalizing clip via optimal transport. InForty-first International Conference on Machine Learning, 2024.(Cited on page 1.)

work page 2024
[48]

Sisouk, J

K. Sisouk, J. Delon, and J. Tierny. A user guide to sampling strategies for sliced optimal transport.arXiv preprint arXiv:2502.02275, 2025.(Cited on pages 4 and 9.)

work page arXiv 2025
[49]

Srivastava, C

S. Srivastava, C. Li, and D. B. Dunson. Scalable bayes via barycenter in wasserstein space. Journal of Machine Learning Research, 19(8):1–35, 2018.(Cited on page 1.)

work page 2018
[50]

Tolstikhin, O

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf. Wasserstein auto-encoders. In International Conference on Learning Representations, 2018.(Cited on page 1.)

work page 2018
[51]

A. Tong, K. FATRAS, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.Transactions on Machine Learning Research, 2024. Expert Certification.(Cited on page 1.)

work page 2024
[52]

H. Tran, Y. Bai, A. Kothapalli, A. Shahbazi, X. Liu, R. D. Martin, and S. Kolouri. Stereographic spherical sliced Wasserstein distances.International Conference on Machine Learning, 2024. (Cited on page 12.) 16

work page 2024
[53]

Villani.Optimal transport: old and new, volume 338

C. Villani.Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.(Cited on pages 1 and 3.)

work page 2008
[54]

F. Wang, C. Poon, and T. Shardlow. Compressed online sinkhorn.arXiv preprint arXiv:2310.05019, 2023.(Cited on page 2.)

work page arXiv 2023
[55]

J. Wang, R. Gao, and Y. Xie. Two-sample test with kernel projected wasserstein distance. In International Conference on Artificial Intelligence and Statistics, pages 8022–8055. PMLR, 2022. (Cited on pages 12 and 29.)

work page 2022
[56]

Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.(Cited on page 11.)

work page 1912
[57]

Streaming Sliced Optimal Transport

M. Yi and S. Liu. Sliced Wasserstein variational inference. InFourth Symposium on Advances in Approximate Bayesian Inference, 2021.(Cited on page 2.) 17 Supplement to “Streaming Sliced Optimal Transport" We first provide skipped technical proofs in in Appendix A. We then provide additional materials in Appendix B. Additional experimental results in stream...

work page 2021
[58]

The qualitative comparison is consistent with 26 Figure 6:Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L= 100, k= 100)

in Figure 7, and (L = 1000, k = 100) in Figure 8. The qualitative comparison is consistent with 26 Figure 6:Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L= 100, k= 100). Figure 7:Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L= 100, k= 200). Figure 8:Gradient flows from Full SW, SW with random...

work page

[1] [1]

Achlioptas, O

P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas. Learning representations and generative models for 3d point clouds. InInternational conference on machine learning, pages 40–49. PMLR, 2018.(Cited on page 1.)

work page 2018

[2] [2]

Ambrogioni, U

L. Ambrogioni, U. Güçlü, Y. Güçlütürk, M. Hinne, M. A. van Gerven, and E. Maris. Wasserstein variational inference.Advances in Neural Information Processing Systems, 31, 2018.(Cited on page 1.)

work page 2018

[3] [3]

Arjovsky, S

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223, 2017.(Cited on page 1.)

work page 2017

[4] [4]

Bernton, P

E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. Approximate Bayesian computation with the Wasserstein distance.Journal of the Royal Statistical Society, 2019.(Cited on page 1.)

work page 2019

[5] [5]

M. T. Boedihardjo. Sharp bounds for max-sliced Wasserstein distances.Foundations of Computational Mathematics, pages 1–32, 2025.(Cited on page 5.)

work page 2025

[6] [6]

Bonet, P

C. Bonet, P. Berg, N. Courty, F. Septier, L. Drumetz, and M.-T. Pham. Spherical sliced- Wasserstein.International Conference on Learning Representations, 2023.(Cited on page 12.)

work page 2023

[7] [7]

Bonet, N

C. Bonet, N. Courty, F. Septier, and L. Drumetz. Efficient gradient flows in sliced-Wasserstein space.Transactions on Machine Learning Research, 2022.(Cited on page 2.)

work page 2022

[8] [8]

Bonet, L

C. Bonet, L. Drumetz, and N. Courty. Sliced-Wasserstein distances and flows on Cartan- Hadamard manifolds.Journal of Machine Learning Research, 26(32):1–76, 2025.(Cited on page 12.)

work page 2025

[9] [9]

Bonneel, J

N. Bonneel, J. Rabin, G. Peyré, and H. Pfister. Sliced and Radon Wasserstein barycenters of measures.Journal of Mathematical Imaging and Vision, 1(51):22–45, 2015.(Cited on pages 1, 4, and 9.)

work page 2015

[10] [10]

Bonnotte.Unidimensional and evolution methods for optimal transportation

N. Bonnotte.Unidimensional and evolution methods for optimal transportation. PhD thesis, Paris 11, 2013.(Cited on page 4.)

work page 2013

[11] [11]

A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint arXiv:1512.03012, 2015.(Cited on page 12.)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

Chen and H.-G

H. Chen and H.-G. Müller. Sliced Wasserstein regression.arXiv preprint arXiv:2306.10601, 2023.(Cited on page 2.) 13

work page arXiv 2023

[13] [13]

M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, pages 2292–2300, 2013.(Cited on page 1.)

work page 2013

[14] [14]

Deshpande, Y.-T

I. Deshpande, Y.-T. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. Forsyth, and A. G. Schwing. Max-sliced Wasserstein distance and its use for GANs. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10648–10656, 2019.(Cited on page 29.)

work page 2019

[15] [15]

Deshpande, Z

I. Deshpande, Z. Zhang, and A. G. Schwing. Generative modeling using the sliced Wasserstein distance. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3483–3491, 2018.(Cited on page 2.)

work page 2018

[16] [16]

Flamary, N

R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud, H. Janati, A. Rakotoma- monjy, I. Redko, A. Rolet, A. Schutz, V. Seguy, D. J. Sutherland, R. Tavenard, A. Tong, and T. Vayer. Pot: Python optimal transport.Journal of Machine Learning Research, 22(78):1–8...

work page 2021

[17] [17]

Fothergill, H

S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin. Instructing people for training gestural interactive systems. InProceedings of the SIGCHI conference on human factors in computing systems, pages 1737–1746, 2012.(Cited on page 12.)

work page 2012

[18] [18]

Fournier and A

N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probability theory and related fields, 162(3):707–738, 2015.(Cited on pages 1 and 4.)

work page 2015

[19] [19]

Goldfeld, K

Z. Goldfeld, K. Kato, G. Rioux, and R. Sadhu. Statistical inference with regularized optimal transport.Information and Inference: A Journal of the IMA, 13(1):iaad056, 2024.(Cited on page 21.)

work page 2024

[20] [20]

Greenwald and S

M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries.ACM SIGMOD Record, 30(2):58–66, 2001.(Cited on page 5.)

work page 2001

[21] [21]

Helgason

S. Helgason. The Radon transform on r n. InIntegral Geometry and Radon Transforms, pages 1–62. Springer, 2011.(Cited on pages 1 and 4.)

work page 2011

[22] [22]

Hu and Z

X. Hu and Z. Lin. Two-sample distribution tests in high dimensions via max-sliced wasserstein distance and bootstrapping.Biometrika, page asaf001, 2025.(Cited on page 2.)

work page 2025

[23] [23]

Karnin, K

Z. Karnin, K. Lang, and E. Liberty. Optimal quantile approximation in streams. InIEEE Symposium on Foundations of Computer Science (FOCS), pages 71–78. IEEE, 2016.(Cited on pages 5 and 6.)

work page 2016

[24] [24]

Kolouri, K

S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, and G. Rohde. Generalized sliced Wasserstein distances. InAdvances in Neural Information Processing Systems, pages 261–272, 2019.(Cited on pages 12 and 29.)

work page 2019

[25] [25]

Kolouri, G

S. Kolouri, G. K. Rohde, and H. Hoffmann. Sliced Wasserstein distance for learning Gaussian mixture models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3427–3436, 2018.(Cited on page 2.) 14

work page 2018

[26] [26]

C.-Y. Lee, T. Batra, M. H. Baig, and D. Ulbricht. Sliced Wasserstein discrepancy for unsuper- vised domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10285–10295, 2019.(Cited on page 2.)

work page 2019

[27] [27]

Leluc, A

R. Leluc, A. Dieuleveut, F. Portier, J. Segers, and A. Zhuman. Sliced-Wasserstein estimation with spherical harmonics as control variates. InProceedings of the 41st International Conference on Machine Learning, 2024.(Cited on page 4.)

work page 2024

[28] [28]

T. Li, C. Meng, H. Xu, and J. Yu. Hilbert curve projection distance for distribution comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(7):4993–5007, 2024.(Cited on page 11.)

work page 2024

[29] [29]

Lipman, R

Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023.(Cited on page 1.)

work page 2023

[30] [30]

Liutkus, U

A. Liutkus, U. Simsekli, S. Majewski, A. Durmus, and F.-R. Stöter. Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. InInternational Conference on Machine Learning, pages 4104–4113. PMLR, 2019.(Cited on page 2.)

work page 2019

[31] [31]

Luong, K

M. Luong, K. Nguyen, N. Ho, R. Haf, D. Phung, and L. Qu. Revisiting deep audio-text retrieval through the lens of transportation. InThe Twelfth International Conference on Learning Representations, 2024.(Cited on page 1.)

work page 2024

[32] [32]

Manole, S

T. Manole, S. Balakrishnan, and L. Wasserman. Minimax confidence intervals for the sliced Wasserstein distance.Electronic Journal of Statistics, 16(1):2252–2345, 2022.(Cited on page 5.)

work page 2022

[33] [33]

Mensch and G

A. Mensch and G. Peyré. Online sinkhorn: Optimal transport distances from sample streams. Advances in Neural Information Processing Systems, 33:1657–1667, 2020.(Cited on page 2.)

work page 2020

[34] [34]

Nadjahi, V

K. Nadjahi, V. De Bortoli, A. Durmus, R. Badeau, and U. Şimşekli. Approximate Bayesian computation with the sliced-Wasserstein distance. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5470–5474. IEEE, 2020. (Cited on page 2.)

work page 2020

[35] [35]

Nadjahi, A

K. Nadjahi, A. Durmus, L. Chizat, S. Kolouri, S. Shahrampour, and U. Simsekli. Statistical and topological properties of sliced probability divergences.Advances in Neural Information Processing Systems, 33:20802–20812, 2020.(Cited on page 5.)

work page 2020

[36] [36]

K. Nguyen. An introduction to sliced optimal transport: Foundations, advances, extensions, and applications.Foundations and Trends®in Computer Graphics and Vision, 17(3-4):171–406, 2025.(Cited on pages 1 and 4.)

work page 2025

[37] [37]

Nguyen, N

K. Nguyen, N. Bariletto, and N. Ho. Quasi-monte carlo for 3d sliced Wasserstein. InThe Twelfth International Conference on Learning Representations, 2024.(Cited on pages 4 and 9.)

work page 2024

[38] [38]

Nguyen, N

K. Nguyen, N. Ho, T. Pham, and H. Bui. Distributional sliced-Wasserstein and applications to generative modeling. InInternational Conference on Learning Representations, 2021.(Cited on page 5.) 15

work page 2021

[39] [39]

Nguyen and P

K. Nguyen and P. Mueller. Summarizing Bayesian nonparametric mixture posterior–sliced optimal transport metrics for Gaussian mixtures.arXiv preprint arXiv:2411.14674, 2024.(Cited on pages 2 and 12.)

work page arXiv 2024

[40] [40]

Nguyen, Y

K. Nguyen, Y. Ni, and P. Mueller. Bayesian density-density regression with application to cell-cell communications.arXiv preprint arXiv:2504.12617, 2025.(Cited on page 2.)

work page arXiv 2025

[41] [41]

Nietert, R

S. Nietert, R. Sadhu, Z. Goldfeld, and K. Kato. Statistical, robustness, and computational guarantees for sliced Wasserstein distances.Advances in Neural Information Processing Systems, 2022.(Cited on page 5.)

work page 2022

[42] [42]

Pele and M

O. Pele and M. Werman. Fast and robust earth mover’s distances. In2009 IEEE 12th International Conference on Computer Vision, pages 460–467. IEEE, September 2009.(Cited on page 1.)

work page 2009

[43] [43]

Peyré and M

G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019.(Cited on pages 1, 3, and 4.)

work page 2019

[44] [44]

Pooladian, H

A.-A. Pooladian, H. Ben-Hamu, C. Domingo-Enrich, B. Amos, Y. Lipman, and R. T. Chen. Multisample flow matching: Straightening flows with minibatch couplings. InInternational Conference on Machine Learning, pages 28100–28127. PMLR, 2023.(Cited on page 1.)

work page 2023

[45] [45]

Rabin, S

J. Rabin, S. Ferradans, and N. Papadakis. Adaptive color transfer with relaxed optimal transport. In2014 IEEE International Conference on Image Processing (ICIP), pages 4852–4856. IEEE, 2014.(Cited on page 4.)

work page 2014

[46] [46]

Santambrogio

F. Santambrogio. Optimal transport for applied mathematicians.Birkäuser, NY, 55(58-63):94, 2015.(Cited on page 12.)

work page 2015

[47] [47]

L. Shi, J. Fan, and J. Yan. Ot-clip: Understanding and generalizing clip via optimal transport. InForty-first International Conference on Machine Learning, 2024.(Cited on page 1.)

work page 2024

[48] [48]

Sisouk, J

K. Sisouk, J. Delon, and J. Tierny. A user guide to sampling strategies for sliced optimal transport.arXiv preprint arXiv:2502.02275, 2025.(Cited on pages 4 and 9.)

work page arXiv 2025

[49] [49]

Srivastava, C

S. Srivastava, C. Li, and D. B. Dunson. Scalable bayes via barycenter in wasserstein space. Journal of Machine Learning Research, 19(8):1–35, 2018.(Cited on page 1.)

work page 2018

[50] [50]

Tolstikhin, O

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf. Wasserstein auto-encoders. In International Conference on Learning Representations, 2018.(Cited on page 1.)

work page 2018

[51] [51]

A. Tong, K. FATRAS, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.Transactions on Machine Learning Research, 2024. Expert Certification.(Cited on page 1.)

work page 2024

[52] [52]

H. Tran, Y. Bai, A. Kothapalli, A. Shahbazi, X. Liu, R. D. Martin, and S. Kolouri. Stereographic spherical sliced Wasserstein distances.International Conference on Machine Learning, 2024. (Cited on page 12.) 16

work page 2024

[53] [53]

Villani.Optimal transport: old and new, volume 338

C. Villani.Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.(Cited on pages 1 and 3.)

work page 2008

[54] [54]

F. Wang, C. Poon, and T. Shardlow. Compressed online sinkhorn.arXiv preprint arXiv:2310.05019, 2023.(Cited on page 2.)

work page arXiv 2023

[55] [55]

J. Wang, R. Gao, and Y. Xie. Two-sample test with kernel projected wasserstein distance. In International Conference on Artificial Intelligence and Statistics, pages 8022–8055. PMLR, 2022. (Cited on pages 12 and 29.)

work page 2022

[56] [56]

Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.(Cited on page 11.)

work page 1912

[57] [57]

Streaming Sliced Optimal Transport

M. Yi and S. Liu. Sliced Wasserstein variational inference. InFourth Symposium on Advances in Approximate Bayesian Inference, 2021.(Cited on page 2.) 17 Supplement to “Streaming Sliced Optimal Transport" We first provide skipped technical proofs in in Appendix A. We then provide additional materials in Appendix B. Additional experimental results in stream...

work page 2021

[58] [58]

The qualitative comparison is consistent with 26 Figure 6:Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L= 100, k= 100)

in Figure 7, and (L = 1000, k = 100) in Figure 8. The qualitative comparison is consistent with 26 Figure 6:Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L= 100, k= 100). Figure 7:Gradient flows from Full SW, SW with random sampling, and Stream-SW in turn (L= 100, k= 200). Figure 8:Gradient flows from Full SW, SW with random...

work page