pith. sign in

arxiv: 2606.10295 · v1 · pith:XQJMJLIUnew · submitted 2026-06-09 · 📊 stat.ML · cs.LG· math.ST· stat.TH

k-Nearest Neighbors in Gromov--Wasserstein Space

Pith reviewed 2026-06-27 12:01 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH
keywords Gromov-Wasserstein distancek-nearest neighborsuniversal consistencygraph classificationmetric measure spacesfused Gromov-Wassersteinnode-attributed graphs
0
0 comments X

The pith

The k-nearest neighbors classifier is universally consistent under the Gromov-Wasserstein distance when applied to graphs represented as metric measure spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that k-nearest neighbors classification using the Gromov-Wasserstein distance converges in probability to the Bayes-optimal classifier as the training sample size grows to infinity. This holds on the space of equivalence classes of metric measure spaces that have finite support and carry the uniform probability measure. The same guarantee transfers to ordinary graphs by equipping each graph with its shortest-path or adjacency distances and the uniform measure on vertices, and it extends via the fused Gromov-Wasserstein distance to graphs whose nodes carry Euclidean feature vectors.

Core claim

We prove the universal consistency of the GW-k-NN classifier on the space of equivalence classes of metric measure spaces with finite support and uniform probability measure. By viewing graphs as finitely supported metric measure spaces equipped with the pairwise distance metric and a uniform probability measure on the nodes, we obtain universal consistency of GW-k-NN for the space of graphs. Likewise for fGW-k-NN, we prove universal consistency on the space of weak isomorphism classes of structured objects consisting of metric measure spaces with finite support and uniform probability measure and feature maps into Euclidean space.

What carries the argument

The Gromov-Wasserstein distance between equivalence classes of finite metric measure spaces, which defines a metric on the quotient space and thereby turns the set of graphs into a metric space on which k-NN can be run directly.

If this is right

  • GW-k-NN is consistent for any classification problem whose data are finite graphs.
  • fGW-k-NN is consistent for any classification problem whose data are finite node-attributed graphs.
  • Consistency holds without first embedding the graphs into a Euclidean vector space.
  • The same consistency guarantee applies to any data type that can be faithfully cast as a finite metric measure space with uniform measure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result supplies a justification for trying other nonparametric classifiers inside the same GW metric space.
  • It raises the question whether similar consistency statements hold when the uniform measure on nodes is replaced by a non-uniform measure derived from node degrees or other graph statistics.
  • The framework immediately suggests testing whether the same k-NN rule remains consistent when the underlying distance is replaced by other optimal-transport-type distances on graphs.

Load-bearing premise

Representing a graph by its nodes equipped with pairwise distances and the uniform probability measure on nodes preserves all information needed for the classification task.

What would settle it

A concrete sequence of probability distributions on graphs (with labels) for which the GW-k-NN error rate remains bounded away from the Bayes error for arbitrarily large sample sizes.

Figures

Figures reproduced from arXiv: 2606.10295 by Caroline Moosmueller, Kaitlyn Hohmeier, Nicolas Fraiman.

Figure 1
Figure 1. Figure 1: Comparison of a graph versus an attributed graph. Left: By viewing a graph as an mm–space where the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

The Gromov--Wasserstein (GW) distance provides a framework for comparing metric measure spaces, regardless of their underlying structure or geometry. For network-based data, it enables direct comparisons of graphs with different numbers of nodes, without requiring an embedding or other abstraction. Furthermore, through a variant of GW known as fused Gromov--Wasserstein (fGW), it is also possible to incorporate node features in addition to graph structure. In this work, we implement $k$-nearest neighbors ($k$-NN) classification using the GW and fGW distances. We prove the universal consistency of the GW-$k$-NN classifier on the space of equivalence classes of metric measure spaces with finite support and uniform probability measure. By viewing graphs as finitely supported metric measure spaces equipped with the pairwise distance metric and a uniform probability measure on the nodes, we obtain universal consistency of GW-$k$-NN for the space of graphs. Likewise for fGW-$k$-NN, we prove universal consistency on the space of weak isomorphism classes of structured objects consisting of metric measure spaces with finite support and uniform probability measure and feature maps into Euclidean space, thus establishing universal consistency on the space of node-attributed graphs. Our numerical experiments show that GW-$k$-NN and fGW-$k$-NN consistently perform well across multiple graph datasets, suggesting that metric classifiers such as $k$-NN work well in the GW framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper implements k-NN classification using Gromov-Wasserstein (GW) and fused GW (fGW) distances on metric measure spaces. It proves universal consistency of GW-k-NN on the quotient space of equivalence classes of finite-support uniform mm-spaces, then identifies graphs with these objects via pairwise distances and uniform node measures to claim consistency for graph classification; an analogous result holds for fGW-k-NN on node-attributed graphs. Experiments on several graph datasets are reported to show competitive performance.

Significance. If the consistency theorems are correctly established, the result supplies a nonparametric consistency guarantee for a metric classifier directly on the GW space of mm-spaces and, by the stated identification, on graphs. This is a substantive theoretical contribution because it avoids embedding or alignment steps and applies to graphs of varying size. The experiments provide supporting empirical evidence, though without reported variance or detailed baselines their strength is limited.

major comments (2)
  1. [§5] §4 (proof of universal consistency) and §5 (extension to graphs): the central claim that consistency on the quotient space of finite-support uniform mm-spaces immediately yields consistency for graph classification rests on the unstated assumption that every graph distribution and label function factors through the pairwise-distance + uniform-measure representation without loss; no theorem is given showing that labels depending on directedness, edge attributes, or other non-metric information remain measurable functions on the GW quotient.
  2. [§5.1] §5.1 (fGW case): the analogous claim for node-attributed graphs via fGW likewise identifies structured objects with mm-spaces plus Euclidean feature maps, but does not address whether the weak-isomorphism classes preserve label-relevant information when node features interact with non-metric graph properties.
minor comments (2)
  1. [Experiments] Experiments section: error bars, standard deviations, or multiple random seeds are not reported, and baseline details (e.g., which competing methods and hyper-parameter ranges) are only sketched.
  2. [§3] Notation: the precise definition of the quotient space (equivalence classes under GW) and the topology used for universal consistency should be stated explicitly before the theorem, rather than referenced only in passing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and valuable comments. We respond to the major comments below and will make revisions to clarify the scope of our results.

read point-by-point responses
  1. Referee: [§5] §4 (proof of universal consistency) and §5 (extension to graphs): the central claim that consistency on the quotient space of finite-support uniform mm-spaces immediately yields consistency for graph classification rests on the unstated assumption that every graph distribution and label function factors through the pairwise-distance + uniform-measure representation without loss; no theorem is given showing that labels depending on directedness, edge attributes, or other non-metric information remain measurable functions on the GW quotient.

    Authors: The paper establishes universal consistency for the GW-k-NN classifier on the quotient space of finite-support uniform mm-spaces. The extension to graphs is made by identifying undirected graphs (without edge attributes) with such spaces via their pairwise distance metric and uniform node measure. This identification is the standard one used in the GW literature for graphs. For label functions that depend on properties not encoded in this representation, such as directedness or edge attributes, the GW distance cannot distinguish them, and our consistency result does not apply to those cases. We did not intend to claim universality over all possible graph distributions. We will revise the manuscript to explicitly delimit the class of graphs considered and to note that the relevant label functions are measurable with respect to the quotient space under this identification. revision: yes

  2. Referee: [§5.1] §5.1 (fGW case): the analogous claim for node-attributed graphs via fGW likewise identifies structured objects with mm-spaces plus Euclidean feature maps, but does not address whether the weak-isomorphism classes preserve label-relevant information when node features interact with non-metric graph properties.

    Authors: Analogously, the fGW result applies to the weak isomorphism classes of mm-spaces equipped with feature maps. For node-attributed graphs, we use this representation assuming that the node features and the metric structure capture the relevant information for the labels. If node features interact with non-metric properties in a way not preserved, the result would not hold. We will add a clarifying statement in §5.1 regarding the scope. revision: yes

Circularity Check

0 steps flagged

No circularity: mathematical consistency proof is self-contained

full rationale

The central claim is a proof of universal consistency of GW-k-NN on the quotient space of finite-support uniform mm-spaces (and likewise for fGW on structured objects). This is a direct extension of classical k-NN consistency theorems in metric spaces to the GW metric; the derivation chain consists of standard measure-theoretic arguments and does not reduce any quantity to a fitted parameter, self-definition, or load-bearing self-citation. The modeling step that identifies graphs with pairwise-distance mm-spaces is an explicit assumption stated in the abstract, not a hidden circular reduction. No equations or steps in the provided text exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The consistency proof rests on the modeling choice that graphs are mm-spaces with uniform node measure and on standard results from statistical learning theory for k-NN consistency in metric spaces.

axioms (2)
  • domain assumption Graphs are faithfully modeled as metric measure spaces with finite support and uniform probability measure on nodes.
    Invoked to transfer the mm-space consistency result to the space of graphs.
  • standard math Universal consistency of k-NN holds in the GW metric on the space of equivalence classes of finite-support mm-spaces.
    The paper states it proves this; the background theory of consistency in metric spaces is assumed.

pith-pipeline@v0.9.1-grok · 5794 in / 1251 out tokens · 30400 ms · 2026-06-27T12:01:30.324334+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 9 canonical work pages

  1. [1]

    Sur la distance de Nagata.C

    Patrice Assouad. Sur la distance de Nagata.C. R. Acad. Sci. Paris, 294(1):31–34, 1982

  2. [2]

    Recouvrements, Derivation Des Measures et Dimensions

    Patrice Assouad and Thierry Quentin de Gromard. Recouvrements, Derivation Des Measures et Dimensions. Rev. Mat. Iberoam, 22(3):893–953, 2006

  3. [3]

    The Z-Gromov-Wasserstein Distance.Journal of Machine Learning Research, 26:1–57, 2025

    Martin Bauer, Facundo M´emoli, Tom Needham, and Mao Nishino. The Z-Gromov-Wasserstein Distance.Journal of Machine Learning Research, 26:1–57, 2025

  4. [4]

    On a Linear Gromov-Wasserstein Distance.IEEE Transactions on Image Processing, 31:7292–7305, 2022

    Florian Beier, Robert Beinert, and Gabriele Steidl. On a Linear Gromov-Wasserstein Distance.IEEE Transactions on Image Processing, 31:7292–7305, 2022

  5. [5]

    Borgwardt and Hans-Peter Kriegel

    Karsten M. Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. InProceedings of the Fifth IEEE International Conference on Data Mining, ICDM ’05, page 74–81, USA, 2005. IEEE Computer Society. ISBN 0769522785. doi: 10.1109/ICDM.2005.132. URLhttps://doi.org/10.1109/ICDM.2005.132. 16

  6. [6]

    Borgwardt, Cheng Soon Ong, Stefan Sch¨onauer, S

    Karsten M. Borgwardt, Cheng Soon Ong, Stefan Sch¨onauer, S. V. N. Vishwanathan, Alex J. Smola, and Hans-Peter Kriegel. Protein function prediction via graph kernels.Bioinformatics, 21(1):47–56, January 2005. ISSN 1367-

  7. [7]

    URLhttps://doi.org/10.1093/bioinformatics/bti1007

    doi: 10.1093/bioinformatics/bti1007. URLhttps://doi.org/10.1093/bioinformatics/bti1007

  8. [8]

    Classification with Imperfect Training Labels

    Timothy Cannings, Yingying Fan, and Richard Samworth. Classification with Imperfect Training Labels. Biometrika, 107(2):311–330, 2020

  9. [9]

    Nearest Neighbor Classification in Infinite Dimension.ESAIM: Probability and Statistics, 10:340–355, 2006

    Fr ´ed´eric C´erou and Arnaud Guyader. Nearest Neighbor Classification in Infinite Dimension.ESAIM: Probability and Statistics, 10:340–355, 2006

  10. [10]

    Rates of convergence for nearest neighbor classification

    Kamalika Chaudhuri and Sanjoy Dasgupta. Rates of convergence for nearest neighbor classification. In Z. Ghahra- mani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URLhttps://proceedings.neurips.cc/ paper_files/paper/2014/file/2b764b803acec2d590...

  11. [11]

    Universal consistency of the𝑘-nn rule in metric spaces and nagata dimension.Probability and Statistics, 24:914–934, 2020

    Benoit Collins, Sushma Kumari, and Vladimir Pestov. Universal consistency of the𝑘-nn rule in metric spaces and nagata dimension.Probability and Statistics, 24:914–934, 2020. doi: 10.1051/ps/2020018. URLhttps: //www.numdam.org/articles/10.1051/ps/2020018/

  12. [12]

    Cover and P

    T. Cover and P. Hart. Nearest neighbor pattern classification.IEEE Transactions on Information Theory, 13(1): 21–27, 1967

  13. [14]

    Springer, 1996

    Luc Devroye, L ´aszl´o Gy¨orfi, and Gabor Lugosi.A Probabilistic Theory of Pattern Recognition. Springer, 1996

  14. [15]

    Distinguishing enzyme structures from non-enzymes without alignments.Journal of Molecular Biology, 330(4):771–783, 2003

    Paul Dobson and Andrew Doig. Distinguishing enzyme structures from non-enzymes without alignments.Journal of Molecular Biology, 330(4):771–783, 2003. ISSN 0022-2836. doi: https://doi.org/10.1016/S0022-2836(03) 00628-4. URLhttps://www.sciencedirect.com/science/article/pii/S0022283603006284

  15. [16]

    Rate of Convergence of k-Nearest-Neighbor Classification Rule

    Maik D ¨oring, L´aszl´o Gy¨orfi, and Harro Walk. Rate of Convergence of k-Nearest-Neighbor Classification Rule. Journal of Machine Learning Research, 18:1–16, 2018

  16. [17]

    Scalable kernels for graphs with continuous attributes

    Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, and Karsten Borgwardt. Scalable kernels for graphs with continuous attributes. InProceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS’13, page 216–224, Red Hook, NY, USA, 2013. Curran Associates Inc

  17. [18]

    Alaya, Aur´elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, L´eo Gautheron, Nathalie T.H

    R ´emi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aur´elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, L´eo Gautheron, Nathalie T.H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander T...

  18. [19]

    Scalable gromov–wasserstein based comparison of biological time series.Bulletin of Mathematical Biology, 85(77), 2023

    Natalia Kravtsova, Reginald McGee, and Adriana Dawes. Scalable gromov–wasserstein based comparison of biological time series.Bulletin of Mathematical Biology, 85(77), 2023

  19. [20]

    PhD thesis, Kyoto University, 2018

    Sushma Kumari.Topics in Random Matrices and Statistical Machine Learning. PhD thesis, Kyoto University, 2018

  20. [21]

    Nagata dimension, quasisymmetric embeddings, and lipschitz extensions

    Urs Lang and Thilo Schlichenmaier. Nagata dimension, quasisymmetric embeddings, and lipschitz extensions. International Mathematics Research Notices, 2005(58):3625–3655, 01 2005. ISSN 1073-7928. doi: 10.1155/ IMRN.2005.3625. URLhttps://doi.org/10.1155/IMRN.2005.3625

  21. [22]

    The pharmacophore kernel for virtual screening with support vector machines.Journal of Chemical Information and Modeling, 46(5):2003–14, August

    Pierre Mah ´e, Liva Ralaivola, V ´eronique Stoven, and Jean-Philippe Vert. The pharmacophore kernel for virtual screening with support vector machines.Journal of Chemical Information and Modeling, 46(5):2003–14, August

  22. [23]

    URLhttps://hal.science/hal-00433580

    doi: 10.1021/ci060138m. URLhttps://hal.science/hal-00433580. 17

  23. [24]

    Gromov-Wasserstein Distances and the Metric Approach to Object Matching.Foundations of Computational Mathematics, 11:417–487, 2011

    Facundo M ´emoli. Gromov-Wasserstein Distances and the Metric Approach to Object Matching.Foundations of Computational Mathematics, 11:417–487, 2011

  24. [25]

    Note on Dimension Theory for Metric Spaces.Fundamenta Mathematicae, 45(1):143–181, 1958

    Jun-iti Nagata. Note on Dimension Theory for Metric Spaces.Fundamenta Mathematicae, 45(1):143–181, 1958

  25. [26]

    Propagation kernels: Ef- ficient graph kernels from propagated information.Machine Learning, 102:209–245, 2015

    Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting. Propagation kernels: Ef- ficient graph kernels from propagated information.Machine Learning, 102:209–245, 2015. doi: 10.1007/ s10994-015-5517-9. URLhttps://link.springer.com/article/10.1007/s10994-015-5517-9

  26. [27]

    Gromov-Wasserstein Averaging of Kernel and Distance Matrices.Proceedings of the 33rd International Conference on Machine Learning, 48, 2016

    Gabriel Peyr ´e, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein Averaging of Kernel and Distance Matrices.Proceedings of the 33rd International Conference on Machine Learning, 48, 2016

  27. [28]

    A novel sliced fused gromov-wasserstein distance, 2025

    Moritz Piening and Robert Beinert. A novel sliced fused gromov-wasserstein distance, 2025

  28. [29]

    Universal Consistency of Wasserstein k-NN Classifier: A Negative and Some Positive Results.Information and Inference: A Journal of the IMA, 12(3), 2023

    Donlapark Ponnoprat. Universal Consistency of Wasserstein k-NN Classifier: A Negative and Some Positive Results.Information and Inference: A Journal of the IMA, 12(3), 2023

  29. [30]

    Dimension of Metrics and Differentiation of Measures.General Topology and Its Relations to Modern Analysis and Algebra, V (Prague, 1981), 3:565–568, 1983

    David Preiss. Dimension of Metrics and Differentiation of Measures.General Topology and Its Relations to Modern Analysis and Algebra, V (Prague, 1981), 3:565–568, 1983

  30. [31]

    Entropic gromov-wasserstein distances: stability and algorithms

    Gabriel Rioux, Ziv Goldfeld, and Kengo Kato. Entropic gromov-wasserstein distances: stability and algorithms. J. Mach. Learn. Res., 25(1), January 2024. ISSN 1532-4435

  31. [32]

    Modelling Convex Shape Priors and Matching Based on the Gromov-Wasserstein Distance.J

    Bernhard Schmitzer and Christoph Schn ¨orr. Modelling Convex Shape Priors and Matching Based on the Gromov-Wasserstein Distance.J. Math. Imaging Vis., 46(1):143–159, May 2013. ISSN 0924-9907. doi: 10.1007/s10851-012-0375-6. URLhttps://doi.org/10.1007/s10851-012-0375-6

  32. [33]

    Efficient graphlet kernels for large graph comparison

    Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. Efficient graphlet kernels for large graph comparison. In David van Dyk and Max Welling, editors,Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, volume 5 ofProceedings of Machine Learning Research, pages 488–495, Hilton C...

  33. [34]

    Consistent Nonparametric Regression.The Annals of Statistics, 5(4):595–620, 1977

    Charles Stone. Consistent Nonparametric Regression.The Annals of Statistics, 5(4):595–620, 1977

  34. [35]

    Number 1443 in Volume 290

    Karl-Theodor Sturm.The Space of Spaces: Curvature Bounds and Gradient Flows on the Space of Metric Measure Spaces. Number 1443 in Volume 290. American Mathematical Society, 2023

  35. [36]

    Spline-Fitting with a Genetic Algorithm: A Method for Develping Classification Structure-Activity Relationships.Journal of Chemical Information and Computer Science, 43(6), 2003

    Jeffrey Sutherland, Lee O’Brian, and Donald Weaver. Spline-Fitting with a Genetic Algorithm: A Method for Develping Classification Structure-Activity Relationships.Journal of Chemical Information and Computer Science, 43(6), 2003

  36. [37]

    Fused Gromov-Wasserstein Distance for Structured Objects.Algorithms, 2020

    Titouan Vayer, Laetitia Chapel, R´emi Flamary, Romain Tavenard, and Nicolas Courty. Fused Gromov-Wasserstein Distance for Structured Objects.Algorithms, 2020. URLhttps://hal.science/hal-02971153/document

  37. [38]

    Vishwanathan

    Pinar Yanardag and S.V.N. Vishwanathan. Deep graph kernels. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, page 1365–1374, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450336642. doi: 10.1145/2783258.2783417. URLhttps://doi.org/10.1145/2783258.2783417. Appendix Due t...