arxiv: 2604.02610 · v1 · submitted 2026-04-03 · 📊 stat.ML · cs.LG

Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport

Rafael Pereira Eufrazio , Eduardo Fernandes Montesuma , Charles Casimiro Cavalcante This is my paper

Pith reviewed 2026-05-13 19:07 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords multi-view embeddingGromov-Wassersteinoptimal transportmultidimensional scalingmanifold learningrelational structure

0 comments

The pith

Gromov-Wasserstein optimal transport yields multi-view embeddings that preserve relational structure across heterogeneous views.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two methods that use Gromov-Wasserstein optimal transport to align distance matrices from multiple data views and recover a shared low-dimensional embedding. Mean-GWMDS averages the distance matrices then applies GW-based multidimensional scaling, while Multi-GWMDS generates several GW-aligned candidate embeddings and picks one representative. These approaches target cases where views have mismatched geometries or nonlinear distortions, avoiding the need for feature concatenation or rigid alignment assumptions. Experiments on synthetic manifolds and real datasets indicate the methods maintain intrinsic relational properties more effectively than classical techniques.

Core claim

The central claim is that Gromov-Wasserstein optimal transport between view-specific distance matrices provides a flexible mechanism to aggregate or select geometry-consistent embeddings, thereby recovering a coherent low-dimensional structure that preserves the relational information present across the original views.

What carries the argument

Gromov-Wasserstein optimal transport applied to distance matrices, used either to average relational structures before multidimensional scaling or to produce and select among multiple aligned candidate embeddings.

If this is right

Multi-view data integration proceeds without forcing views into a common feature space or assuming linear relations.
Nonlinear distortions between views are accommodated through direct comparison of distance structures.
Either averaging or selection from GW-aligned candidates produces a single representative embedding.
Validation occurs on both synthetic manifolds with known geometry and real-world datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same GW alignment step could be applied to fuse relational data in graph matching or domain adaptation settings.
Weighting distance matrices by view reliability before averaging might improve robustness when some views are noisier.
Scaling the number of candidate embeddings in Multi-GWMDS could be tested to trade off computation against structure preservation.

Load-bearing premise

Pairwise distance matrices from each view adequately capture the relational structures needed for meaningful alignment via Gromov-Wasserstein transport.

What would settle it

A controlled experiment on synthetic multi-view data with known shared manifold structure in which the proposed embeddings distort distances or clusters more than standard concatenation methods would falsify the preservation claim.

Figures

Figures reproduced from arXiv: 2604.02610 by Charles Casimiro Cavalcante, Eduardo Fernandes Montesuma, Rafael Pereira Eufrazio.

**Figure 2.** Figure 2: Embeddings obtained on the S-curve dataset using different multi [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Low-dimensional embeddings of the Electricity Load Diagrams [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Low-dimensional embeddings of the Electricity Load Diagrams [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Embeddings on the Electricity Load Diagrams dataset obtained [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

read the original abstract

Multi-view data analysis seeks to integrate multiple representations of the same samples in order to recover a coherent low-dimensional structure. Classical approaches often rely on feature concatenation or explicit alignment assumptions, which become restrictive under heterogeneous geometries or nonlinear distortions. In this work, we propose two geometry-aware multi-view embedding strategies grounded in Gromov-Wasserstein (GW) optimal transport. The first, termed Mean-GWMDS, aggregates view-specific relational information by averaging distance matrices and applying GW-based multidimensional scaling to obtain a representative embedding. The second strategy, referred to as Multi-GWMDS, adopts a selection-based paradigm in which multiple geometry-consistent candidate embeddings are generated via GW-based alignment and a representative embedding is selected. Experiments on synthetic manifolds and real-world datasets show that the proposed methods effectively preserve intrinsic relational structure across views. These results highlight GW-based approaches as a flexible and principled framework for multi-view representation learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces two GW-OT strategies for multi-view embeddings that avoid direct alignment, but the averaging step in Mean-GWMDS rests on an unaddressed assumption about commensurate distance scales.

read the letter

The main new pieces are the Mean-GWMDS and Multi-GWMDS procedures. Mean-GWMDS takes distance matrices from each view, averages them, and feeds the result into GW-based multidimensional scaling. Multi-GWMDS instead produces several GW-aligned candidate embeddings and then selects one. Both are framed as ways to recover shared low-dimensional structure when the views have mismatched geometries or nonlinear distortions, which is a step past simple concatenation or explicit alignment methods that dominate the area. The motivation is clear and the choice of GW optimal transport fits the relational focus without requiring pointwise correspondences. That part lands reasonably well. The experiments on synthetic manifolds and real data are said to support the claims, though the abstract gives no numbers, baselines, or implementation details to evaluate how strong the gains actually are. The soft spot sits in Mean-GWMDS. Averaging raw distance matrices assumes the inputs sit on comparable scales; otherwise the combined matrix encodes a warped geometry. Nothing in the description indicates per-view normalization or scale-invariant preprocessing, and the stress-test concern about heterogeneous scales is not resolved by the abstract. Multi-GWMDS sidesteps this by working with alignments and selection, so it looks more robust on that dimension. The overall argument is internally consistent and draws on established OT results without circularity or invented entities. This is aimed at researchers already comfortable with optimal transport who want concrete alternatives for multi-view representation learning. A reader in that niche would get usable ideas from the two paradigms even if the experiments need more scrutiny. It deserves peer review because the proposals are specific, the framing is honest, and the central motivation holds up despite the scale gap in one method.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes two Gromov-Wasserstein optimal transport methods for multi-view embedding: Mean-GWMDS, which averages view-specific distance matrices before applying GW-based multidimensional scaling, and Multi-GWMDS, which generates multiple GW-aligned candidate embeddings and selects a representative one. The central claim is that these geometry-aware strategies recover intrinsic relational structure across views with heterogeneous geometries and nonlinear distortions, supported by experiments on synthetic manifolds and real-world datasets.

Significance. If the scale-commensurability issue in averaging is resolved and the experiments include rigorous quantitative metrics with baselines, the work could provide a principled alternative to feature concatenation for multi-view representation learning. The grounding in established GW OT is a strength, and the selection-based Multi-GWMDS offers a potentially robust way to handle view inconsistencies.

major comments (2)

[Abstract / Mean-GWMDS procedure] Abstract and method description of Mean-GWMDS: averaging raw distance matrices without per-view normalization or scale standardization assumes commensurate scales, which directly undermines the structure-preservation claim under heterogeneous geometries (the weakest assumption identified). This is load-bearing for the central claim, as unnormalized averaging can distort the aggregated geometry.
[Experiments] Experiments section: the abstract reports positive outcomes on synthetic and real data but provides no quantitative metrics, baselines, or ablation details; without these, the claim that the methods 'effectively preserve intrinsic relational structure' cannot be verified and requires explicit tables or figures with R², stress, or alignment error comparisons.

minor comments (2)

[Method] Notation: clarify whether distance matrices are Euclidean, geodesic, or kernel-based, and specify the GW formulation (e.g., quadratic or entropic) used in each method.
[Introduction] Missing references: add citations to prior GW-MDS work and multi-view embedding baselines (e.g., CCA variants or manifold alignment methods) for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our methods and strengthen the empirical evaluation. We address each major comment below and have revised the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Abstract / Mean-GWMDS procedure] Abstract and method description of Mean-GWMDS: averaging raw distance matrices without per-view normalization or scale standardization assumes commensurate scales, which directly undermines the structure-preservation claim under heterogeneous geometries (the weakest assumption identified). This is load-bearing for the central claim, as unnormalized averaging can distort the aggregated geometry.

Authors: We agree that averaging unnormalized distance matrices risks scale distortion under heterogeneous geometries. In the revised manuscript we introduce per-view normalization (dividing each distance matrix by its Frobenius norm) before averaging, and we update both the abstract and the Mean-GWMDS procedure description to reflect this step. This ensures scale commensurability while preserving the relational structure that GW-MDS then recovers. revision: yes
Referee: [Experiments] Experiments section: the abstract reports positive outcomes on synthetic and real data but provides no quantitative metrics, baselines, or ablation details; without these, the claim that the methods 'effectively preserve intrinsic relational structure' cannot be verified and requires explicit tables or figures with R², stress, or alignment error comparisons.

Authors: We acknowledge the need for explicit quantitative evaluation. The revised manuscript now includes tables reporting stress, R², and Procrustes alignment error for both proposed methods against baselines (feature concatenation, CCA, and standard MDS). We also add an ablation on the number of candidate embeddings in Multi-GWMDS and include these results in the main text with corresponding figures. revision: yes

Circularity Check

0 steps flagged

No circularity: methods grounded in external GW OT framework with independent experimental validation

full rationale

The paper defines Mean-GWMDS as averaging view distance matrices followed by GW-MDS and Multi-GWMDS as GW alignment plus selection. Both steps invoke the standard Gromov-Wasserstein optimal transport formulation (an external, independently established mathematical object) rather than defining any quantity in terms of itself or renaming a fitted parameter as a prediction. No equations reduce the output embedding to the input distances by algebraic identity, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. Experiments on synthetic manifolds and real datasets serve as external checks rather than tautological confirmation. The scale-commensurability concern raised by the skeptic is an assumption-validity issue, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all technical details are absent.

pith-pipeline@v0.9.0 · 5461 in / 961 out tokens · 27898 ms · 2026-05-13T19:07:54.326496+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Mean-GWMDS aggregates view-specific relational information by averaging distance matrices and applying GW-based multidimensional scaling to obtain a representative embedding.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GW2_2(μ,ν) = min_π ∑ L((DX)iℓ,(DY)jk) πij πℓk with L the squared loss on intra-domain distance matrices.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Gromov-Wasserstein Methods for Multi-View Relational Embedding and Clustering
cs.LG 2026-04 unverdicted novelty 6.0

Bary-GWMDS computes Gromov-Wasserstein barycenters of distance matrices to produce stable consensus embeddings from multi-view relational data, and Mean-GWMDS-C averages distances for reduced-support clustering.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · cited by 1 Pith paper

[1]

Z. Yu, Z. Dong, C. Yu, K. Yang, Z. Fan, C. P. Chen, A review on multi- view learning, Frontiers of Computer Science 19 (7) (2025) 197334

work page 2025
[2]

Zhang, L

Q. Zhang, L. Zhang, R. Song, R. Cong, Y. Liu, W. Zhang, Learning common semantics via optimal transport for contrastive multi-view clustering, IEEE Transactions on Image Processing (2024)

work page 2024
[3]

R. Lin, S. Du, S. Wang, W. Guo, Multi-view clustering via optimal transport algorithm, Knowledge-Based Systems 279 (2023) 110954

work page 2023
[4]

Peyré, M

G. Peyré, M. Cuturi, Computational optimal transport: With applica- tions to data science, Foundations and Trends®in Machine Learning 11 (5-6) (2019) 355–607

work page 2019
[5]

H. Van Assel, Cédric Vincent-Cuaz, Nicolas Courty, Rémi Flamary, Pascal Frossard, Titouan Vayer, Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein, Transactions on Machine Learning Research (2025). URLhttps://openreview.net/forum?id=cllm6SS354 21

work page 2025
[6]

R. A. Clark, T. Needham, T. Weighill, Generalized dimension reduction using semi-relaxed Gromov-Wasserstein distance, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, 2025, pp. 16082– 16090

work page 2025
[7]

R. P. Eufrazio, E. F. Montesuma, C. C. Cavalcante, A dimensionality reduction technique based on the Gromov-Wasserstein distance, in: International Conference on Geometric Science of Information, Springer, 2025, pp. 111–120

work page 2025
[8]

E. F. Montesuma, F. M. N. Mboula, A. Souloumiac, Recent advances in optimal transport for machine learning, IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

work page 2024
[9]

Mémoli, Gromov–Wasserstein distances and the metric approach to object matching, Foundations of Computational Mathematics 11 (2011) 417–487

F. Mémoli, Gromov–Wasserstein distances and the metric approach to object matching, Foundations of Computational Mathematics 11 (2011) 417–487

work page 2011
[10]

Kerdoncuff, R

T. Kerdoncuff, R. Emonet, M. Sebban, Sampled Gromov-Wasserstein, Machine Learning 110 (8) (2021) 2151–2186

work page 2021
[11]

Y. Qin, X. Zhang, S. Yu, G. Feng, A survey on representation learning for multi-view data, Neural Networks 181 (2025) 106842

work page 2025
[12]

A. R. Chowdhury, A. Gupta, S. Das, Deep multi-view clustering: A comprehensive survey of the contemporary techniques, Information Fusion (2025) 103012

work page 2025
[13]

Salter-Townshend, T

M. Salter-Townshend, T. H. McCormick, Latent space models for mul- tiview network data, The Annals of Applied Statistics 11 (3) (2017) 1217

work page 2017
[14]

C. Xu, D. Tao, C. Xu, A survey on multi-view learning, arXiv preprint arXiv:1304.5634 (2013)

work page Pith review arXiv 2013
[15]

Peyré, M

G. Peyré, M. Cuturi, J. Solomon, Gromov-wasserstein averaging of kernel and distance matrices, in: International conference on machine learning, PMLR, 2016, pp. 2664–2672

work page 2016
[16]

Trindade, ElectricityLoadDiagrams20112014, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C58C86 (2015)

A. Trindade, ElectricityLoadDiagrams20112014, UCI Machine Learning Repository, DOI: https://doi.org/10.24432/C58C86 (2015)

work page doi:10.24432/c58c86 2015
[17]

Kelly, R

M. Kelly, R. Longjohn, K. Nottingham, The UCI machine learning repository (2023). 22

work page 2023
[18]

Y. Shi, T. Yu, Q. Liu, H. Zhu, F. Li, Y. Wu, An approach of electrical load profile analysis based on time series data mining, IEEE Access 8 (2020) 209915–209925

work page 2020
[19]

Hotelling, Relations between twosets of variates, in: Breakthroughs in Statistics: Methodology and Distribution, Springer, 1992, pp

H. Hotelling, Relations between twosets of variates, in: Breakthroughs in Statistics: Methodology and Distribution, Springer, 1992, pp. 162–190

work page 1992
[20]

Sun, A survey of multi-view machine learning, Neural Computing and Applications 23 (7) (2013) 2031–2038

S. Sun, A survey of multi-view machine learning, Neural Computing and Applications 23 (7) (2013) 2031–2038

work page 2013
[21]

J. R. Kettenring, Canonical analysis of several sets of variables, Biometrika 58 (3) (1971) 433–451

work page 1971
[22]

S. Bai, X. Bai, L. J. Latecki, Q. Tian, Multidimensional scaling on mul- tiple input distance matrices, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31, 2017, pp. 1281–1287

work page 2017
[23]

Rodosthenous, V

T. Rodosthenous, V. Shahrezaei, M. Evangelou, Multi-view data vi- sualisation via manifold learning, PeerJ Computer Science 10 (2024) e1993. 23

work page 2024