Sum-of-Squares Hierarchy for the Gromov Wasserstein Problem
Pith reviewed 2026-05-23 03:32 UTC · model grok-4.3
The pith
Sum-of-squares hierarchies yield tractable semidefinite programs that approximate the Gromov-Wasserstein distance and converge to the global optimum.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed SOS hierarchies provide computationally tractable proxies of the GW distance and the associated distortion distances over metric measure spaces that are otherwise intractable to compute, by simplifying the moment hierarchies with marginal constraints while proving convergence to the global optimum.
What carries the argument
The simplified Putinar-type and Schmüdgen-type moment hierarchies for the GW quadratic program, which use marginal constraints to reduce the semidefinite programs and ensure convergence.
If this is right
- The hierarchies converge to the global optimal solutions of the GW problem.
- The induced measures satisfy the triangle inequality and act as genuine distances.
- These provide proxies for GW distance that can be computed via semidefinite programming.
- Convergence rates are established for the hierarchies towards the optimum.
Where Pith is reading between the lines
- If the marginal constraints always allow such simplification, similar SOS approaches could apply to other optimal transport variants with quadratic objectives.
- These distances might be used in machine learning pipelines where exact GW is too slow, offering a tunable accuracy via hierarchy level.
- Testing on small metric spaces where GW can be solved exactly would validate the convergence claims in practice.
Load-bearing premise
The marginal constraints of the GW problem can be used to simplify the moment hierarchies while still allowing convergence to the global optimum.
What would settle it
Solve the SOS hierarchy for a pair of small discrete metric spaces where the exact GW optimum is known by enumeration, and check if the relaxation value matches the true value at finite level.
Figures
read the original abstract
The Gromov-Wasserstein (GW) problem is a variant of the classical optimal transport problem that allows one to compute meaningful transportation plans between incomparable spaces. At an intuitive level, it seeks plans that minimize the discrepancy between metric evaluations of pairs of points. The GW problem is typically cast as an instance of a non-convex quadratic program that is, unfortunately, intractable to solve. In this paper, we describe tractable semidefinite relaxations of the GW problem based on the Sum-of-Squares (SOS) hierarchy. We describe how the Putinar-type and the Schm\"udgen-type moment hierarchies can be simplified using marginal constraints, and we prove convergence rates for these hierarchies towards computing global optimal solutions to the GW problem. The proposed SOS hierarchies naturally induce a distance measure analogous to the distortion metrics, and we show that these are genuine distances in that they satisfy the triangle inequality. In particular, the proposed SOS hierarchies provide computationally tractable proxies of the GW distance and the associated distortion distances (over metric measure spaces) that are otherwise intractable to compute.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops sum-of-squares (SOS) hierarchies for the non-convex Gromov-Wasserstein (GW) optimal transport problem. It shows how to simplify the Putinar-type and Schmüdgen-type moment relaxations by incorporating the linear marginal constraints of the transport plan, proves convergence rates of these hierarchies to the global optimum, and demonstrates that the induced distances satisfy the triangle inequality, providing tractable proxies for the GW distance.
Significance. If the claimed convergence holds after the proposed simplifications, this work would offer a theoretically grounded computational approach to an important but intractable problem in optimal transport between metric measure spaces. The triangle inequality property strengthens the practical utility. The use of established SOS theory is a strength.
major comments (3)
- [Section describing the simplified moment hierarchies (likely §3 or §4)] The central claim that marginal constraints simplify the Putinar/Schmüdgen hierarchies while preserving convergence to the global GW optimum is load-bearing. The manuscript must explicitly verify that the reduced quadratic module (after quotienting by the marginal ideal) remains archimedean, as required for Putinar's positivstellensatz and the density of the truncated module; otherwise a positive gap may persist even at infinite degree.
- [Proof of convergence rates (likely §5)] The asserted convergence rates for the reduced hierarchies must be derived from the standard rates for Putinar/Schmüdgen relaxations; any modification induced by the marginal equalities needs to be quantified to confirm the rates remain valid and do not introduce a gap at finite relaxation order.
- [Section on induced distances and triangle inequality (likely §6)] The proof that the SOS-induced distortion distances satisfy the triangle inequality relies on the specific form of the relaxations; this property is load-bearing for the claim that they are genuine distances, and the argument should be checked for dependence on the marginal simplification.
minor comments (2)
- [Notation and definitions] Clarify notation for the localizing matrices in the reduced hierarchies to distinguish the original and marginal-constrained cases.
- [Numerical section] If numerical experiments are present, include a table comparing the SOS bounds to known GW values or other relaxations on small instances to illustrate tightness.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. We address each of the three major comments below and will revise the paper to strengthen the explicit justifications for the key technical claims.
read point-by-point responses
-
Referee: [Section describing the simplified moment hierarchies (likely §3 or §4)] The central claim that marginal constraints simplify the Putinar/Schmüdgen hierarchies while preserving convergence to the global GW optimum is load-bearing. The manuscript must explicitly verify that the reduced quadratic module (after quotienting by the marginal ideal) remains archimedean, as required for Putinar's positivstellensatz and the density of the truncated module; otherwise a positive gap may persist even at infinite degree.
Authors: We agree that an explicit verification of archimedeanness after reduction is essential. The manuscript implicitly relies on compactness of the metric measure spaces to preserve this property under the linear marginal ideal, but we will add a dedicated lemma in the section on simplified hierarchies proving that the reduced quadratic module remains archimedean. This will be included in the revision. revision: yes
-
Referee: [Proof of convergence rates (likely §5)] The asserted convergence rates for the reduced hierarchies must be derived from the standard rates for Putinar/Schmüdgen relaxations; any modification induced by the marginal equalities needs to be quantified to confirm the rates remain valid and do not introduce a gap at finite relaxation order.
Authors: We will expand the convergence analysis in Section 5 to explicitly derive the rates from the standard Putinar/Schmüdgen theory, quantifying the impact of the marginal equalities. This will confirm that no additional finite-order gaps are introduced beyond the standard hierarchy rates. revision: yes
-
Referee: [Section on induced distances and triangle inequality (likely §6)] The proof that the SOS-induced distortion distances satisfy the triangle inequality relies on the specific form of the relaxations; this property is load-bearing for the claim that they are genuine distances, and the argument should be checked for dependence on the marginal simplification.
Authors: We have verified that the triangle inequality argument relies only on the monotonicity and consistency properties of the relaxations, which are preserved after the marginal simplification. We will add a clarifying remark in Section 6 to make this independence explicit. revision: yes
Circularity Check
No circularity: standard SOS theory applied to GW marginals with independent convergence proof
full rationale
The derivation applies established Putinar and Schmüdgen moment hierarchies to the non-convex GW quadratic program, then uses the problem's own linear marginal constraints to simplify the SDP relaxations while claiming to prove convergence to the global optimum. No step reduces the claimed convergence or distance property to a quantity defined by the paper itself, a fitted parameter renamed as prediction, or a self-citation chain. The central results rest on external SOS representation theorems and the algebraic structure of the marginal ideal, which are independent of the target GW value. This is self-contained against external benchmarks and matches the expected non-circular case.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Gromov-Wasserstein Alignment of Word Embedding Spaces
[AMJ18] David Alvarez-Melis and Tommi Jaakkola. Gromov-Wasserstein Alignment of Word Embedding Spaces. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages 1881–1890. Association for Com- putational Linguistics,
work page 2018
-
[2]
Exactness conditions for semidefinite relaxations of the quadratic assignment problem
[CS24] Junyu Chen and Yong Sheng Soh. Exactness conditions for semidefinite relaxations of the quadratic assignment problem. arXiv preprint arXiv:2409.08802 ,
-
[3]
Gromov-Wasserstein Optimal Transport to Align Single-cell Multi-omics Data
[DSS+20] Pinar Demetci, Rebecca Santorella, Bj¨ orn Sandstede, William Stafford Noble, and Ritambhara Singh. Gromov-Wasserstein Optimal Transport to Align Single-cell Multi-omics Data. BioRxiv, pages 2020–04,
work page 2020
-
[4]
The NP-hardness of the Gromov-Wasserstein distance
[Kra24] Natalia Kravtsova. The NP-hardness of the Gromov-Wasserstein distance. arXiv preprint arXiv:2408.06525,
-
[5]
On the Exactness of SDP Relaxation for Quadratic Assignment Problem
[Lin24] Shuyang Ling. On the Exactness of SDP Relaxation for Quadratic Assignment Problem. arXiv preprint arXiv:2408.05942 ,
-
[6]
Convergence Rates of S.O.S Hierarchies for Polynomial Semidefinite Programs
[TT24] Hoang Anh Tran and Kim-Chuan Toh. Convergence Rates of S.O.S Hierarchies for Polynomial Semidefinite Programs. arXiv preprint arXiv:2406.12013 ,
-
[7]
On the Convergence Rates of Moment- SOS Hierarchies Approximation of Truncated Moment Sequences
[TT25] Hoang Anh Tran and Kim-Chuan Toh. On the Convergence Rates of Moment- SOS Hierarchies Approximation of Truncated Moment Sequences. arXiv preprint arXiv:2507.00572,
-
[8]
A polynomial-time relaxation of the Gromov-Hausdorff distance
28 [VBBW16] Soledad Villar, Afonso S Bandeira, Andrew J Blumberg, and Rachel Ward. A Polynomial-time Relaxation of the Gromov-Hausdorff distance. arXiv preprint arXiv:1610.05214,
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.