arxiv: 2605.08546 · v1 · submitted 2026-05-08 · 📊 stat.ML · cs.LG· math.OC

Recognition: 2 theorem links

· Lean Theorem

Sliced Inner Product Gromov-Wasserstein Distances

Xiaoyun Gong , Gabriel Rioux , Ziv Goldfeld

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:09 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC

keywords Gromov-Wasserstein distancesliced distancesinner product costdata alignmentheterogeneous datasetsrotational invariancecomputational efficiency

0 comments

The pith

The sliced inner-product Gromov-Wasserstein distance enables scalable alignment of high-dimensional heterogeneous datasets with rotational invariance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tackles the computational and statistical scaling problems of the Gromov-Wasserstein distance when aligning datasets with different dimensions or geometries. It focuses on the variant using inner product costs and introduces a slicing technique that projects the problem onto one-dimensional lines. This yields a distance with closed-form solutions in one dimension and a natural invariance under rotations. The work studies the properties of this sliced distance and demonstrates its use in clustering text data and comparing language model representations.

Core claim

The paper resolves the one-dimensional closed-form issue for the inner-product Gromov-Wasserstein problem by proposing a sliced IGW distance. This distance is shown to be rotationally invariant, and its structural and computational properties are analyzed in detail, supported by numerical experiments and applications to real data.

What carries the argument

The sliced inner product Gromov-Wasserstein distance, obtained by averaging one-dimensional GW distances with inner product costs over random projections, which carries the alignment while ensuring invariance.

If this is right

The approach scales to high-dimensional data by reducing to one-dimensional problems.
It preserves rotational invariance, making it suitable for data without preferred orientations.
The distance can be used for heterogeneous clustering and representation comparison.
Computational efficiency is achieved through closed-form expressions in the sliced setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar slicing might apply to other GW variants if one-dimensional solutions can be derived.
Applications could extend to more complex machine learning tasks involving distribution matching.
Testing the method on synthetic data with known rotations would verify the invariance property.

Load-bearing premise

That combining the inner product cost with random slicing maintains the essential geometric matching properties of the original GW problem without major loss of information.

What would settle it

Finding two point clouds where the optimal alignment according to full IGW differs substantially from that found by the sliced version.

Figures

Figures reproduced from arXiv: 2605.08546 by Gabriel Rioux, Xiaoyun Gong, Ziv Goldfeld.

**Figure 1.** Figure 1: Validation of the MC error from Lemma 4.8. The estimated value of E[|IGWdm(µ, ν) 2 − IGW(µ, ν) 2 |] follows the theoretical rate O( p log(m)/m) across optimization methods and initializations. The shaded area corresponds to the maximum and minimum values from the 25 realizations. 2 5 2 6 2 7 2 8 2 9 2 10 2 11 2 12 2 13 sample size n 10−3 10−2 10−1 100 101 expected absolute error ∆0 = [Idx 0] | 2 5 2 6 2 7… view at source ↗

**Figure 2.** Figure 2: Validation of the empirical convergence rate from Lemma 4.7. For fixed m, the estimated value of E[|IGWdm,n − IGW(µ, ν)|] scales as C1 + C2 p log(n)/n. The shaded area corresponds to the maximum and minimum values from the 25 realizations. and ν = N (0, Σν) with dx = 5 and dy = 10, where the covariance matrices are randomly generated. In this setting, both IGW(µ, ν) and the projected one-dimensional costs … view at source ↗

**Figure 3.** Figure 3: Comparison of distances between embeddings of sentences from ag-news and amazon-polarity. Each point corresponds to the distance between the embeddings generated by the student and teacher models with the various metrics. The labels indicate the number of layers L and embedding dimension H for the student architecture, abbreviated as L-H. distilled BERT student models from [69]. The teacher has L = 12 tran… view at source ↗

**Figure 4.** Figure 4: Results of user clustering experiment. The 2d representations of the users are obtained via 2d MDS applied to the sliced IGW distance matrix. The top row corresponds to the homogeneous experiment whereas the bottom row is the heterogeneous one. The columns are organized as follows: Left: ground truth clusters, Center left: model used to encode the user texts, Center right: result of clustering based on sli… view at source ↗

**Figure 5.** Figure 5: Illustration of the proof of Item (ii) in Proposition 4.4. For any θ ∈ S d−1 where the identity mapping is strictly optimal, there exists a spherical cap around θ on which the same mapping remains optimal. This optimality extends to the entire cone extending from the origin through that cap. As the characteristic functions of the projected measures coincide along all rays in that cone, they also coincide… view at source ↗

**Figure 6.** Figure 6: Average time required for the 25 runs used to generate the data for the Monte Carlo error plot [PITH_FULL_IMAGE:figures/full_fig_p046_6.png] view at source ↗

**Figure 7.** Figure 7: Average time required for the 25 runs used to generate the data for the sample complexity plot [PITH_FULL_IMAGE:figures/full_fig_p046_7.png] view at source ↗

**Figure 8.** Figure 8: Time required to compute the various metrics between the student and teacher embeddings to generate the data presented in [PITH_FULL_IMAGE:figures/full_fig_p048_8.png] view at source ↗

read the original abstract

The Gromov-Wasserstein (GW) problem provides a framework for aligning heterogeneous datasets by matching their intrinsic geometry, but its statistical and computational scaling remains an issue for high-dimensional problems. Slicing techniques offer an appealing route to scalability, but, unlike Wasserstein distances, GW problems do not generally admit closed-form solutions in one-dimension. We resolve this problem for the GW problem with inner product cost (IGW), propose a sliced IGW distance that enjoys a natural rotational invariance property, and comprehensively study its structural and computational properties. Numerical experiments validating our theory are presented, followed by applications to heterogeneous clustering of text data and language model representation comparison.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a sliced inner-product GW distance with closed-form 1D solutions and rotational invariance to improve scalability, but the key question is how well it preserves original alignments.

read the letter

The main contribution is a sliced version of the inner-product Gromov-Wasserstein distance. For this specific cost, the one-dimensional projections admit closed forms, and the distance ends up rotationally invariant. That combination is new and directly targets the computational bottleneck in high-dimensional GW problems that the abstract flags as open. They follow through with a study of the structural and computational properties plus experiments on text clustering and language model representation comparison. Those experiments appear to show practical gains on the chosen tasks. The rotational invariance is a clean property that falls out naturally here and could be useful in embedding spaces. The soft spot is the one raised in the stress test. Random slicing to one dimension can collapse directions that carry relational structure, and it is not obvious that the resulting couplings stay close to what the full IGW would produce on heterogeneous data. The paper claims to study this, so there are presumably some checks or bounds, but the strength of the evidence depends on how tightly they quantify the distortion. If the experiments only compare against other sliced baselines rather than the unsliced IGW on controlled cases, that would leave the preservation question open. This work is aimed at researchers who apply optimal transport to high-dimensional ML tasks such as representation alignment or clustering. Readers who already use GW distances or sliced Wasserstein variants will see the most immediate value in the closed-form and invariance results. It deserves a serious referee because it offers a concrete, computable proposal for a known limitation and includes both theory and applications. I would recommend sending it out for peer review.

Referee Report

3 major / 2 minor

Summary. The paper proposes a sliced inner-product Gromov-Wasserstein (IGW) distance to address the computational and statistical scaling limitations of standard GW distances in high dimensions. It claims to resolve the lack of closed-form solutions for 1D GW problems when using inner-product costs, derives a sliced IGW that is rotationally invariant, provides a comprehensive theoretical study of its structural and computational properties, validates the theory with numerical experiments, and demonstrates applications to heterogeneous text clustering and language model representation comparison.

Significance. If the central claims hold, the work provides a scalable, closed-form-enabled alternative to GW distances that preserves alignment of intrinsic geometries while adding rotational invariance, which is particularly useful for high-dimensional heterogeneous data tasks. The explicit resolution of the 1D closed-form issue for IGW and the reproducible numerical validation of theory are notable strengths.

major comments (3)

[§3.2, Eq. (7)] §3.2, Eq. (7): The closed-form solution for the 1D IGW distance is derived by reducing to a sorting-based expression, but the subsequent definition of the sliced IGW in Eq. (10) as an average over random projections lacks a quantitative bound on the approximation error to the full high-dimensional IGW; this is load-bearing for the claim that slicing resolves scalability without distorting alignments.
[Theorem 4.1] Theorem 4.1: The rotational invariance property is established, yet the proof does not address whether the optimal couplings recovered from 1D projections match those of the original IGW for heterogeneous data; the skeptic concern about projection-induced collapse of relational structure is not directly tested via a counterexample or stability analysis.
[§5, Table 2] §5, Table 2: The language model comparison experiments report improved clustering metrics for sliced IGW, but without a direct ablation comparing sliced vs. full IGW on a low-dimensional synthetic dataset where the full distance is computable, it is unclear whether key geometric information is preserved.

minor comments (2)

[§3.1] The notation for the projection directions in §3.1 is introduced without an explicit statement that they are drawn uniformly from the unit sphere; this should be clarified for reproducibility.
[Figure 3] Figure 3 caption does not specify the number of random projections used in the sliced distance computation, which affects interpretation of the runtime plots.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. We address each major point below and will revise the paper accordingly to strengthen the presentation and address the concerns.

read point-by-point responses

Referee: [§3.2, Eq. (7)] §3.2, Eq. (7): The closed-form solution for the 1D IGW distance is derived by reducing to a sorting-based expression, but the subsequent definition of the sliced IGW in Eq. (10) as an average over random projections lacks a quantitative bound on the approximation error to the full high-dimensional IGW; this is load-bearing for the claim that slicing resolves scalability without distorting alignments.

Authors: We appreciate the referee highlighting this aspect of the presentation. The closed-form expression in Eq. (7) follows directly from reducing the 1D IGW problem to a sorting-based optimal transport problem under the inner-product cost. For the sliced IGW in Eq. (10), the manuscript emphasizes empirical validation and the fact that the 1D closed form enables efficient computation, following the standard rationale for sliced distances. We acknowledge that an explicit quantitative bound on the approximation error to the full IGW would further support the scalability claims. In the revision we will add a dedicated remark in Section 3 discussing convergence as the number of projections increases, along with a high-probability bound derived via concentration of random projections on the inner-product structure. revision: partial
Referee: [Theorem 4.1] Theorem 4.1: The rotational invariance property is established, yet the proof does not address whether the optimal couplings recovered from 1D projections match those of the original IGW for heterogeneous data; the skeptic concern about projection-induced collapse of relational structure is not directly tested via a counterexample or stability analysis.

Authors: Thank you for this observation. Theorem 4.1 establishes rotational invariance of the sliced IGW distance itself, which follows because random projections commute with orthogonal transformations and the 1D IGW is invariant under sign flips. The optimal couplings used in the sliced formulation are those of the projected 1D problems; these need not coincide exactly with the couplings of the full high-dimensional IGW. This discrepancy is inherent to any projection-based approximation and does not imply collapse of relational structure, since the inner-product cost is preserved under the projections. The manuscript does not contain a counterexample or stability analysis because the primary focus was on metric properties rather than coupling recovery. We will add a short paragraph after Theorem 4.1 noting this distinction and referencing the empirical preservation of alignment observed in the experiments. revision: partial
Referee: [§5, Table 2] §5, Table 2: The language model comparison experiments report improved clustering metrics for sliced IGW, but without a direct ablation comparing sliced vs. full IGW on a low-dimensional synthetic dataset where the full distance is computable, it is unclear whether key geometric information is preserved.

Authors: We agree that this ablation would provide useful additional evidence. While the current experiments focus on high-dimensional heterogeneous data where the full IGW is intractable, we will include a new synthetic experiment in the revised Section 5. Specifically, we will generate low-dimensional (2D and 3D) point clouds with known geometric structure, compute both the full IGW (via standard solvers) and the sliced IGW, and report the difference in recovered alignments and clustering metrics. This will directly demonstrate that the sliced version preserves the essential geometric information. revision: yes

Circularity Check

0 steps flagged

No circularity: sliced IGW defined from standard 1D closed-form IGW and uniform projections

full rationale

The derivation introduces the sliced IGW by averaging one-dimensional inner-product GW distances over random projections on the sphere. This follows directly from the fact that the inner-product cost admits an explicit 1D solution (already established in the GW literature) and the standard construction of sliced distances; no equation reduces the new distance to a fitted parameter, a self-referential definition, or a load-bearing self-citation. Rotational invariance is a direct geometric consequence of the projection measure, not an imported uniqueness theorem. All structural and computational claims are derived from these definitions and validated by independent numerical experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central contribution rests on the assumption that inner-product costs are appropriate for GW and that random slicing yields a valid metric with the stated invariance; no explicit free parameters or new entities beyond the distance definition itself.

axioms (2)

domain assumption Inner product cost is a valid and useful choice for the Gromov-Wasserstein problem
Invoked to enable closed-form 1D solutions as stated in the abstract.
domain assumption Slicing preserves essential geometric alignment properties
Required for the sliced version to serve as a proxy for the full IGW distance.

invented entities (1)

Sliced IGW distance no independent evidence
purpose: Scalable approximation to inner-product GW with closed-form 1D computation and rotational invariance
Newly defined metric whose properties are studied in the paper.

pith-pipeline@v0.9.0 · 5407 in / 1239 out tokens · 49990 ms · 2026-05-12T01:09:18.739344+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

IGW(µ, ν) := (inf_π∈Π(µ,ν) ∬ |⟨x,x′⟩ − ⟨y,y′⟩|² dπ⊗π )^{1/2}
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

closed-form univariate IGW via quantile functions and identity/anti-identity permutations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · 1 internal anchor

[1]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization algorithms on matrix manifolds, Princeton University Press, 2008

work page 2008
[2]

23211–23225

Suman Adhya and Debarshi Kumar Sanyal,S2wtm: Spherical sliced-Wasserstein autoencoder for topic modeling, Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 23211–23225

work page 2025
[3]

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré,Gradient flows: in metric spaces and in the space of probability measures, Springer Science & Business Media, 2008

work page 2008
[4]

Xavier Aramayo Carrasco, Maksim Nekrashevich, Petr Mokrov, Evgeny Burnaev, and Alexander Korotin, Uncovering challenges of solving the continuous Gromov-Wasserstein problem, arXiv preprint arXiv:2303.05978 (2023)

work page arXiv 2023
[5]

Erhan Bayraktar and Gaoyue Guo,Strong equivalence between metrics of Wasserstein type, Electronic Communi- cations in Probability26(2021), 1 – 13

work page 2021
[6]

2, 1028–1032

Robert Beinert, Cosmas Heiss, and Gabriele Steidl,On assignment problems related to Gromov–Wasserstein distances on the real line, SIAM Journal on Imaging Sciences16(2023), no. 2, 1028–1032

work page 2023
[7]

March T Boedihardjo,Sharp bounds for max-sliced Wasserstein distances, Foundations of Computational Mathe- matics (2025), 1–32

work page 2025
[8]

Bogachev,Measure theory, vol

Vladimir I. Bogachev,Measure theory, vol. 1, Springer, 2007

work page 2007
[9]

32, 1–76

Clément Bonet, Lucas Drumetz, and Nicolas Courty,Sliced-Wasserstein distances and flows on Cartan-Hadamard manifolds, Journal of Machine Learning Research26(2025), no. 32, 1–76

work page 2025
[10]

Nguyen, Shang Hui Koh, and Yong Sheng Soh,Semidefinite relaxations of the Gromov– Wasserstein distance, Advances in Neural Information Processing Systems (A

Junyu Chen, Binh T. Nguyen, Shang Hui Koh, and Yong Sheng Soh,Semidefinite relaxations of the Gromov– Wasserstein distance, Advances in Neural Information Processing Systems (A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, eds.), vol. 37, Curran Associates, Inc., 2024, pp. 69814–69839

work page 2024
[11]

Pengfei Chen, Rongzhen Zhao, Tianjing He, Kongyuan Wei, and Qidong Yang,Unsupervised domain adaptation of bearing fault diagnosis based on join sliced Wasserstein distance, ISA transactions129(2022), 504–519

work page 2022
[12]

4, 757–787

Samir Chowdhury and Facundo Mémoli,The Gromov–Wasserstein distance between networks and stable network invariants, Information and Inference: A Journal of the IMA8(2019), no. 4, 757–787

work page 2019
[13]

Clarke,Optimization and nonsmooth analysis, Society for Industrial and Applied Mathematics, 1990

Frank H. Clarke,Optimization and nonsmooth analysis, Society for Industrial and Applied Mathematics, 1990

work page 1990
[14]

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian, A discourse-aware attention model for abstractive summarization of long documents, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) ...

work page 2018
[15]

report, Institut de Recherche Mathématiques de Rennes, 2002

Michel Coste,An introduction to semialgebraic geometry, Tech. report, Institut de Recherche Mathématiques de Rennes, 2002. SLICED INNER PRODUCT GROMOV–WASSERSTEIN DISTANCES 27

work page 2002
[16]

Sanjit Dandapanthula, Aleksandr Podkopaev, Shiva Prasad Kasiviswanathan, Aaditya Ramdas, and Ziv Goldfeld, Optimal transportation and alignment between Gaussian measures, arXiv preprint arXiv:2512.03579 (2025)

work page arXiv 2025
[17]

1, 119–154

Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, and Jason D Lee,Stochastic subgradient method converges on tame functions, Foundations of Computational Mathematics20(2020), no. 1, 119–154

work page 2020
[18]

Damek Davis, Dmitriy Drusvyatskiy, Yin Tat Lee, Swati Padmanabhan, and Guanghao Ye,A gradient sampling method with complexity guarantees for lipschitz functions in high and low dimensions, Advances in Neural Information Processing Systems35(2022), 6692–6703

work page 2022
[19]

4, 1178–1198

JulieDelon, AgnesDesolneux, andAntoineSalmona,Gromov–Wasserstein distances between Gaussian distributions, Journal of Applied Probability59(2022), no. 4, 1178–1198

work page 2022
[20]

Forsyth, and Alexander G

Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David A. Forsyth, and Alexander G. Schwing,Max-sliced Wasserstein distance and its use for GANs, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10648–10656

work page 2019
[21]

Schwing,Generative modeling using the sliced Wasserstein distance, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp

Ishan Deshpande, Ziyu Zhang, and Alexander G. Schwing,Generative modeling using the sliced Wasserstein distance, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3483–3491

work page 2018
[22]

Théo Dumont, Théo Lacombe, and François-Xavier Vialard,On the existence of Monge maps for the Gromov– Wasserstein problem, Foundations of Computational Mathematics (2024), 1–48

work page 2024
[23]

Folland,How to integrate a polynomial over a sphere, The American Mathematical Monthly108(2001), no

Gerald B. Folland,How to integrate a polynomial over a sphere, The American Mathematical Monthly108(2001), no. 5, 446–448

work page 2001
[24]

1, iaad056

Ziv Goldfeld, Kengo Kato, Gabriel Rioux, and Ritwik Sadhu,Statistical inference with regularized optimal transport, Information and Inference: A Journal of the IMA13(2024), no. 1, iaad056

work page 2024
[25]

Antonio Gulli,Ag’s corpus of news articles, 2004,http://groups.di.unipi.it/~gulli/AG_corpus_of_news_ articles.html

work page 2004
[26]

Rajinder Jeet Hans-Gill, Madhu Raka, and Ranjeet Sehmi,Lecture notes on geometry of numbers, Springer, 2024

work page 2024
[27]

Lars Hörmander,The analysis of linear partial differential operators i: Distribution theory and fourier analysis, Springer-Verlag, 1983

work page 1983
[28]

6, 3717–3748

Xiaoyin Hu, Nachuan Xiao, Xin Liu, and Kim-Chuan Toh,A constraint dissolving approach for nonsmooth optimization over the Stiefel manifold, IMA Journal of Numerical Analysis44(2024), no. 6, 3717–3748

work page 2024
[29]

1, 371–413

Wen Huang and Ke Wei,Riemannian proximal gradient methods, Mathematical Programming194(2022), no. 1, 371–413

work page 2022
[30]

1, 193–218

Lawrence Hubert and Phipps Arabie,Comparing partitions, Journal of classification2(1985), no. 1, 193–218

work page 1985
[31]

Venkatkrishna Karumanchi, Gabriel Rioux, and Ziv Goldfeld,Approximation analysis of the entropic penalty in quadratic programming, arXiv preprint arXiv:2509.20031 (2025)

work page arXiv 2025
[32]

Wallach, H

Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Roland Badeau, and Gustavo Rohde,Generalized sliced Wasserstein distances, Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019

work page 2019
[33]

Pope, Charles E

Soheil Kolouri, Phillip E. Pope, Charles E. Martin, and Gustavo K. Rohde,Sliced Wasserstein auto-encoders, International Conference on Learning Representations (ICLR), 2019

work page 2019
[34]

4, 2385–2401

Siyu Kong and Adrian S Lewis,The cost of nonconvexity in deterministic nonsmooth optimization, Mathematics of Operations Research49(2024), no. 4, 2385–2401

work page 2024
[35]

3519–3529

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton,Similarity of Neural Network represen- tations revisited, International Conference on Machine Learning, PMLR, 2019, pp. 3519–3529

work page 2019
[36]

340, American Mathematical Soc., 2001

Steven George Krantz,Function theory of several complex variables, vol. 340, American Mathematical Soc., 2001

work page 2001
[37]

12164–12203

Khang Le, Dung Q Le, Huy Nguyen, Dat Do, Tung Pham, and Nhat Ho,Entropic Gromov-Wasserstein between Gaussian distributions, International Conference on Machine Learning, PMLR, 2022, pp. 12164–12203

work page 2022
[38]

Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Hassan Sajjad, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan- Major, Philipp Schmid, Sylvain Gugg...

work page 2021
[39]

Jie Li, Dan Xu, and Shaowen Yao,Sliced Wasserstein distance for neural style transfer, Computers & Graphics 102(2022), 89–98

work page 2022
[40]

3, 1605–1634

Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, and Anthony Man-Cho So,Weakly convex optimization over Stiefel manifold using Riemannian subgradient-type methods, SIAM Journal on Optimization31(2021), no. 3, 1605–1634. 28 X. GONG, G. RIOUX, AND Z. GOLDFELD

work page 2021
[41]

Tianyi Lin, Zeyu Zheng, Elynn Chen, Marco Cuturi, and Michael I Jordan,On projection robust optimal transport: Sample complexity and model misspecification, International Conference on Artificial Intelligence and Statistics, PMLR, 2021, pp. 262–270

work page 2021
[42]

Xinran Liu, Elaheh Akbari, Rocio Diaz Martin, Navid NaderiAlizadeh, and Soheil Kolouri,Efficient transferable optimal transport via min-sliced transport plans, arXiv preprint arXiv:2511.19741 (2025)

work page arXiv 2025
[43]

Xinran Liu, Rocio Diaz Martin, Yikun Bai, Ashkan Shahbazi, Matthew Thorpe, Akram Aldroubi, and Soheil Kolouri,Expected sliced transport plans, arXiv preprint arXiv:2410.12176 (2024)

work page arXiv 2024
[44]

2566–2576

Yuzhe Lu, Xinran Liu, Andrea Soltoggio, and Soheil Kolouri,Slosh: Set locality sensitive hashing via sliced- Wasserstein embeddings, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 2566–2576

work page 2024
[45]

Maas, Raymond E

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts,Learning word vectors for sentiment analysis, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Portland, Oregon, USA) (Dekang Lin, Yuji Matsumoto, and Rada Mihalcea, eds.), Association for Comp...

work page 2011
[46]

1, 2252 – 2345

Tudor Manole, Sivaraman Balakrishnan, and Larry Wasserman,Minimax confidence intervals for the sliced Wasserstein distance, Electronic Journal of Statistics16(2022), no. 1, 2252 – 2345

work page 2022
[47]

Facundo Mémoli,Gromov–Wasserstein distances and the metric approach to object matching, Foundations of Computational Mathematics11(2011), 417–487

work page 2011
[48]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers,MTEB: Massive text embedding benchmark, arXiv preprint arXiv:2210.07316 (2022)

work page internal anchor Pith review arXiv 2022
[49]

Nika Naderializadeh, Danial Salehi, Xinran Liu, and Soheil Kolouri,Constrained sliced Wasserstein embedding, arXiv preprint arXiv:2506.02203 (2025)

work page arXiv 2025
[50]

5470–5474

Kimia Nadjahi, Valentin De Bortoli, Alain Durmus, Roland Badeau, and Umut Şimşekli,Approximate Bayesian computation with the sliced-Wasserstein distance, ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2020, pp. 5470–5474

work page 2020
[51]

Kimia Nadjahi, Alain Durmus, Lénaïc Chizat, Soheil Kolouri, Shahin Shahrampour, and Umut Simsekli,Statistical and topological properties of sliced probability divergences, Advances in Neural Information Processing Systems33 (2020), 20802–20812

work page 2020
[52]

Kimia Nadjahi, Alain Durmus, Pierre E Jacob, Roland Badeau, and Umut Simsekli,Fast approximation of the sliced-Wasserstein distance using concentration of random projections, Advances in Neural Information Processing Systems34(2021), 12411–12424

work page 2021
[53]

Kimia Nadjahi, Alain Durmus, Umut Simsekli, and Roland Badeau,Asymptotic guarantees for learning generative models with the sliced-Wasserstein distance, Advances in Neural Information Processing Systems32(2019)

work page 2019
[54]

Khai Nguyen and Nhat Ho,Amortized projection optimization for sliced Wasserstein generative models, Advances in Neural Information Processing Systems35(2022), 36985–36998

work page 2022
[55]

Khai Nguyen, Hai Nguyen, and Nhat Ho,Fast estimation of Wasserstein distances via regression on sliced Wasserstein distances, arXiv preprint arXiv:2509.20508 (2025)

work page arXiv 2025
[56]

Khai Nguyen, Tongzheng Ren, and Nhat Ho,Markovian sliced Wasserstein distances: Beyond independent projections, Advances in Neural Information Processing Systems36(2023), 39812–39841

work page 2023
[57]

Sloan Nietert, Ziv Goldfeld, Ritwik Sadhu, and Kengo Kato,Statistical, robustness, and computational guarantees for sliced Wasserstein distances, Advances in Neural Information Processing Systems35(2022), 28179–28193

work page 2022
[58]

4, 2663 – 2688

Jonathan Niles-Weed and Philippe Rigollet,Estimation of Wasserstein distances in the spiked transport model, Bernoulli28(2022), no. 4, 2663 – 2688

work page 2022
[59]

2664–2672

Gabriel Peyré, Marco Cuturi, and Justin Solomon,Gromov-Wasserstein averaging of kernel and distance matrices, International conference on machine learning, PMLR, 2016, pp. 2664–2672

work page 2016
[60]

6667, Springer, 2011, pp

Julien Rabin, Gabriel Peyré, Julie Delon, and Marc Bernot,Wasserstein barycenter and its application to texture mixing, Scale Space and Variational Methods in Computer Vision, Lecture Notes in Computer Science, vol. 6667, Springer, 2011, pp. 435–446

work page 2011
[61]

363, 1–52

Gabriel Rioux, Ziv Goldfeld, and Kengo Kato,Entropic Gromov–Wasserstein distances: Stability and algorithms, Journal of Machine Learning Research25(2024), no. 363, 1–52

work page 2024
[62]

,Limit laws for Gromov–Wasserstein alignment with applications to testing graph isomorphisms, arXiv preprint arXiv:2410.18006 (2024)

work page arXiv 2024
[63]

Tyrrell Rockafellar and Roger JB Wets,Variational analysis, Springer, 1998

R. Tyrrell Rockafellar and Roger JB Wets,Variational analysis, Springer, 1998

work page 1998
[64]

87, Springer, 2015

Filippo Santambrogio,Optimal transport for applied mathematicians, vol. 87, Springer, 2015

work page 2015
[65]

19347–19365

Meyer Scetbon, Gabriel Peyré, and Marco Cuturi,Linear-time Gromov Wasserstein distances using low rank couplings and costs, International Conference on Machine Learning, PMLR, 2022, pp. 19347–19365

work page 2022
[66]

SLICED INNER PRODUCT GROMOV–WASSERSTEIN DISTANCES 29

Thibault Séjourné, François-Xavier Vialard, and Gabriel Peyré,The unbalanced gromov wasserstein distance: Conic formulation and relaxation, Advances in Neural Information Processing Systems34(2021), 8766–8779. SLICED INNER PRODUCT GROMOV–WASSERSTEIN DISTANCES 29

work page 2021
[67]

Ashkan Shahbazi, Elaheh Akbari, Darian Salehi, Xinran Liu, Navid NaderiAlizadeh, and Soheil Kolouri,Espformer: Doubly-stochastic attention with expected sliced transport plans, arXiv preprint arXiv:2502.07962 (2025)

work page arXiv 2025
[68]

Justin Solomon, Gabriel Peyré, Vladimir G Kim, and Suvrit Sra,Entropic metric alignment for correspondence problems, ACM Transactions on Graphics (ToG)35(2016), no. 4, 1–13

work page 2016
[69]

Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova,Well-read students learn better: On the importance of pre-training compact models, International Conference on Learning Representations, 2020

work page 2020
[70]

Titouan Vayer,A contribution to optimal transport on incomparable spaces, 2020, PhD thesis, Institut Polytechnique de Paris

work page 2020
[71]

Titouan Vayer, Rémi Flamary, Romain Tavenard, Laetitia Chapel, and Nicolas Courty,Sliced Gromov-Wasserstein, Advances in Neural Information Processing Systems32(2019)

work page 2019
[72]

Roman Vershynin,Introduction to the non-asymptotic analysis of random matrices, arXiv preprint arXiv:1011.3027 (2010)

work page Pith review arXiv 2010
[73]

,High-dimensional probability: An introduction with applications in data science, 2nd ed., Cambridge University Press, 2026, Available athttps://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-2.pdf

work page 2026
[74]

338, Springer, 2008

Cédric Villani,Optimal transport: old and new, vol. 338, Springer, 2008

work page 2008
[75]

Cédric Vincent-Cuaz, Rémi Flamary, Marco Corneli, Titouan Vayer, and Nicolas Courty,Semi-relaxed Gromov– Wasserstein divergence with applications on graphs, arXiv preprint arXiv:2110.02753 (2021)

work page arXiv 2021
[76]

Forschungsinst

John von Neumann,Some matrix inequalities and metrization of matrix space, Mitt. Forschungsinst. Math. Mech. Kujbyschew-Univ. Tomsk 1, 286-300 (1937)., 1937

work page 1937
[77]

Wainwright,High-dimensional statistics: A non-asymptotic viewpoint, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 2019

Martin J. Wainwright,High-dimensional statistics: A non-asymptotic viewpoint, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 2019

work page 2019
[78]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush,Transformers: State-of-the-art ...

work page 2020
[79]

Jiaqi Xi and Jonathan Niles-Weed,Distributional convergence of the sliced Wasserstein process, Advances in Neural Information Processing Systems35(2022), 13961–13973

work page 2022
[80]

1, 366–397

Nachuan Xiao, Xin Liu, and Kim-Chuan Toh,Dissolving constraints for Riemannian optimization, Mathematics of Operations Research49(2024), no. 1, 366–397

work page 2024

Showing first 80 references.