Improved convergence rate of kNN graph Laplacians: differentiable self-tuned affinity
Pith reviewed 2026-05-23 18:39 UTC · model grok-4.3
The pith
kNN graph Laplacians converge pointwise to manifold operators at rate O(N^{-2/(d+6)}), up to a log factor.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the manifold data setting, where N i.i.d. samples x_i are drawn from a density p on a d-dimensional unknown manifold embedded in high-dimensional Euclidean space, the kNN graph Laplacian with affinity W_ij = epsilon^{-d/2} k_0 ( ||x_i - x_j||^2 / (epsilon phi(hat rho(x_i), hat rho(x_j))^2) ) converges pointwise as an operator to the limiting manifold operator at rate O(N^{-2/(d+6)}), up to a log factor, when k0 and phi are C^3 and epsilon ~ N^{-2/(d+6)}, k ~ N^{6/(d+6)}.
What carries the argument
The self-tuned affinity that inserts the rescaled kNN distance hat rho(x) into the kernel bandwidth via the symmetric bivariate function phi, combined with the refined analysis of the kNN estimator that controls its error to the optimal order.
If this is right
- The stated rate is obtained at the optimal order that balances theoretical bias and variance errors for the given regularity.
- The convergence holds for the general affinity form when k0 and phi satisfy C^3 regularity and the listed technical conditions.
- A refined analysis of the kNN estimator error is developed that supports the improved rate and may apply to other estimators.
- Numerical experiments on simulated manifold data confirm the predicted convergence behavior.
Where Pith is reading between the lines
- The same self-tuning construction could be tested on other graph operators such as the normalized Laplacian or diffusion maps to check whether similar rate gains appear.
- In practice the result suggests choosing k and epsilon jointly at the derived scalings rather than fixing one and tuning the other.
- The approach may extend to settings with boundary or with varying manifold dimension if the local kNN estimator can be adapted accordingly.
Load-bearing premise
The N data points are independent draws from a density p supported on a smooth d-dimensional manifold embedded in Euclidean space.
What would settle it
Numerical computation of the pointwise operator error on a known manifold (such as the unit sphere) for increasing N, with epsilon and k at the stated scalings, showing that the error fails to decay at rate N^{-2/(d+6)} or better would contradict the claimed convergence rate.
Figures
read the original abstract
In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN graph where the $k$NN distance is used to set the kernel bandwidth adaptively. In this work, we consider a general class of $k$NN graph where the graph affinity is $W_{ij} = \epsilon^{-d/2} k_0 ( \| x_i - x_j \|^2 / \epsilon \phi( \hat \rho(x_i), \hat \rho(x_j) )^2 ) $, with $\hat{\rho}(x)$ being the (rescaled) $k$NN distance at the point $x$, $\phi$ a symmetric bi-variate function, and $k_0$ a non-negative function on $[0,\infty)$. Under the manifold data setting, where $N$ i.i.d. samples $x_i$ are drawn from a density $p$ on a $d$-dimensional unknown manifold embedded in a high dimensional Euclidean space, we prove the operator pointwise convergence of the $k$NN graph Laplacian to the limiting manifold operator (depending on $p$) at the rate of $O(N^{-2/(d+6)})$, up to a log factor, when $k_0$ and $\phi$ have $C^3$ regularity and satisfy other technical conditions. This is obtained when $\epsilon \sim N^{-2/(d+6)}$ and $k \sim N^{6/(d+6)}$, both at the optimal order to balance the theoretical bias and variance errors. Our improved convergence rate is based on a refined analysis of the $k$NN estimator, which can be of independent interest. We validate our theory by numerical experiments on simulated data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to establish pointwise convergence of a general class of kNN graph Laplacians, defined via self-tuned affinities W_ij = ε^{-d/2} k_0 ( ||x_i - x_j||^2 / (ε ϕ(ρ̂(x_i), ρ̂(x_j))^2) ), to the limiting manifold operator (depending on p). Under the manifold data setting with N i.i.d. samples from density p on a d-dimensional manifold embedded in high-dimensional space, it proves operator pointwise convergence at rate O(N^{-2/(d+6)}) up to a log factor, when k_0 and ϕ have C^3 regularity, achieved at the scalings ε ∼ N^{-2/(d+6)} and k ∼ N^{6/(d+6)}. The result relies on a refined analysis of the kNN distance estimator ρ̂.
Significance. If the rate holds after accounting for all error terms, this would constitute a meaningful improvement in convergence guarantees for adaptive (self-tuned) graph Laplacians over prior work on manifold learning, with the refined kNN analysis potentially useful beyond this setting. The numerical validation on simulated data is noted but secondary to the theoretical claim.
major comments (1)
- [refined analysis of the kNN estimator ρ̂ and its insertion into W_ij (bias-variance decomposition)] The bias-variance analysis underlying the claimed O(N^{-2/(d+6)}) rate (balancing ε and k scalings) must explicitly control the covariance between ρ̂(x_i) and ρ̂(x_j) for pairs with ||x_i - x_j|| = O(√ε). Substantial neighborhood overlap induces dependence that the current decomposition appears to treat as weak or absent; an unaccounted covariance term larger than N^{-4/(d+6)} would invalidate the pointwise rate. This is load-bearing for the central convergence claim.
minor comments (1)
- Clarify the precise form of the logarithmic factor in the main theorem statement (rather than only in the abstract) to allow direct verification of the rate.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on our manuscript. We address the single major comment below.
read point-by-point responses
-
Referee: [refined analysis of the kNN estimator ρ̂ and its insertion into W_ij (bias-variance decomposition)] The bias-variance analysis underlying the claimed O(N^{-2/(d+6)}) rate (balancing ε and k scalings) must explicitly control the covariance between ρ̂(x_i) and ρ̂(x_j) for pairs with ||x_i - x_j|| = O(√ε). Substantial neighborhood overlap induces dependence that the current decomposition appears to treat as weak or absent; an unaccounted covariance term larger than N^{-4/(d+6)} would invalidate the pointwise rate. This is load-bearing for the central convergence claim.
Authors: We agree that an explicit bound on Cov(ρ̂(x_i), ρ̂(x_j)) for ||x_i - x_j|| = O(√ε) is essential to close the variance analysis. Our refined kNN estimator analysis (which underpins the improved rate) derives joint concentration bounds that incorporate neighborhood overlap; the C^3 regularity of ϕ and k_0 is used to propagate these into the affinity entries W_ij. Under the stated scalings, the covariance contribution remains strictly smaller than N^{-4/(d+6)} (up to logs). Nevertheless, to eliminate any ambiguity in the current presentation of the decomposition, we will insert a dedicated lemma making the covariance control fully explicit. This constitutes a partial revision. revision: partial
Circularity Check
No circularity: mathematical convergence proof under stated assumptions
full rationale
The paper derives the O(N^{-2/(d+6)}) pointwise convergence rate for the kNN graph Laplacian operator via bias-variance analysis of the kNN distance estimator inserted into the affinity kernel, with epsilon and k balanced at the stated orders. All error terms follow from the manifold sampling assumption, C^3 regularity on k0 and phi, and standard concentration bounds; no parameter is fitted to the target rate, no self-citation chain is load-bearing for the central claim, and the result is not obtained by renaming or self-definition. The derivation is self-contained against the external manifold model.
Axiom & Free-Parameter Ledger
free parameters (2)
- epsilon scaling =
N^{-2/(d+6)}
- k scaling =
N^{6/(d+6)}
axioms (2)
- domain assumption N i.i.d. samples drawn from density p on d-dimensional manifold embedded in high-dimensional Euclidean space.
- domain assumption k0 and φ have C^3 regularity and satisfy other technical conditions.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Wij = ϵ^{-d/2} k0(∥xi−xj∥² / ϵ ϕ(ˆρ(xi),ˆρ(xj))²) … prove … O(N^{-2/(d+6)} √log N) when k0,ϕ ∈ C³ … ε∼N^{-2/(d+6)}, k∼N^{6/(d+6)}
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
¯ρr solves t^d (1 + r Q(x) t²) = 1/p(x) … relative error O((k/N)^{3/d} + √(log N/k))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Über homogene polynome in (L2)
Stefan Banach. Über homogene polynome in (L2). Studia Mathematica, 7(1):36–44, 1938
work page 1938
-
[2]
Semi-supervised learning on manifolds.Machine Learning Journal, 1, 2002
Mikhail Belkin and Partha Niyogi. Semi-supervised learning on manifolds.Machine Learning Journal, 1, 2002
work page 2002
-
[3]
Laplacian eigenmaps for dimensionality reduction and data representation
Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003
work page 2003
-
[4]
Towards a theoretical foundation for Laplacian-based manifold methods
Mikhail Belkin and Partha Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. Journal of Computer and System Sciences, 74(8):1289–1308, 2008
work page 2008
-
[5]
Amit Bermanis, Moshe Salhov, Guy Wolf, and Amir Averbuch. Measure-based diffusion grid construction and high-dimensional data discretization.Applied and Computational Harmonic Analysis, 40(2):207–228, 2016
work page 2016
-
[6]
Graph approxima- tions to geodesics on embedded manifolds
Mira Bernstein, Vin De Silva, John C Langford, and Joshua B Tenenbaum. Graph approxima- tions to geodesics on embedded manifolds. Technical report, 2000
work page 2000
-
[7]
Variable bandwidth diffusion kernels.Applied and Computational Harmonic Analysis, 40(1):68–96, 2016
Tyrus Berry and John Harlim. Variable bandwidth diffusion kernels.Applied and Computational Harmonic Analysis, 40(1):68–96, 2016
work page 2016
-
[8]
Gérard Biau and Luc Devroye.Lectures on the nearest neighbor method, volume 246. Springer, 2015
work page 2015
-
[9]
Jeff Calder, Nicolas Garcia Trillos, and Marta Lewicka. Lipschitz regularity of graph Laplacians on random data clouds.SIAM Journal on Mathematical Analysis, 54(1):1169–1222, 2022
work page 2022
-
[10]
Improved spectral convergence rates for graph Laplacians on ε-graphs and k-NN graphs
Jeff Calder and Nicolas Garcia Trillos. Improved spectral convergence rates for graph Laplacians on ε-graphs and k-NN graphs. Applied and Computational Harmonic Analysis, 60:123–175, 2022
work page 2022
-
[11]
Xiuyuan Cheng, Alexander Cloninger, and Ronald R Coifman. Two-sample statistics based on anisotropic kernels.Information and Inference: A Journal of the IMA, 9(3):677–719, 2020. 54
work page 2020
-
[12]
Convergence of graph Laplacian with kNN self-tuned kernels
Xiuyuan Cheng and Hau-Tieng Wu. Convergence of graph Laplacian with kNN self-tuned kernels. Information and Inference: A Journal of the IMA, 11(3):889–957, 2022
work page 2022
-
[13]
Xiuyuan Cheng and Nan Wu. Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation.Applied and Computational Harmonic Analysis, 61:132–190, 2022
work page 2022
-
[14]
The manifold scattering transform for high-dimensional point cloud data
Joyce Chew, Holly Steach, Siddharth Viswanath, Hau-Tieng Wu, Matthew Hirn, Deanna Needell, Matthew D Vesely, Smita Krishnaswamy, and Michael Perlmutter. The manifold scattering transform for high-dimensional point cloud data. InTopological, Algebraic and Geometric Learning Workshops 2022, pages 67–78. PMLR, 2022
work page 2022
-
[15]
Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006
Ronald R Coifman and Stéphane Lafon. Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006
work page 2006
-
[16]
Sebastian Damrich, Philipp Berens, and Dmitry Kobak. Persistent homology for high- dimensional data based on spectral methods.arXiv preprint arXiv:2311.03087, 2023
-
[17]
The strong uniform consistency of nearest neighbor density estimates
Luc P Devroye and Terry J Wagner. The strong uniform consistency of nearest neighbor density estimates. The Annals of Statistics, pages 536–540, 1977
work page 1977
-
[18]
Manfredo Perdigao Do Carmo and J Flaherty Francis.Riemannian geometry, volume 6. Springer, 1992
work page 1992
-
[19]
David B Dunson, Hau-Tieng Wu, and Nan Wu. Spectral convergence of graph Laplacian and heat kernel reconstruction inL∞ from random samples.Applied and Computational Harmonic Analysis, 55:282–336, 2021
work page 2021
-
[20]
Graph based gaussian processes on restricted domains
David B Dunson, Hau-Tieng Wu, and Nan Wu. Graph based gaussian processes on restricted domains. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):414–439, 2022
work page 2022
-
[21]
Asymptotic behavior of lp-based Laplacian regularization in semi-supervised learning
Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J Wainwright, and Michael I Jordan. Asymptotic behavior of lp-based Laplacian regularization in semi-supervised learning. In Conference on Learning Theory, pages 879–906. PMLR, 2016
work page 2016
-
[22]
Bernardo Fichera, Slava Borovitskiy, Andreas Krause, and Aude G Billard. Implicit manifold gaussian process regression.Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[23]
Analysis and algorithms forlp-based semi- supervised learning on graphs
Mauricio Flores, Jeff Calder, and Gilad Lerman. Analysis and algorithms forlp-based semi- supervised learning on graphs. Applied and Computational Harmonic Analysis, 60:77–122, 2022
work page 2022
-
[24]
Nicolás García Trillos, Moritz Gerlach, Matthias Hein, and Dejan Slepčev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace–Beltrami operator.Foundations of Computational Mathematics, 20(4):827–887, 2020
work page 2020
-
[25]
Multi-manifold semi-supervised learning
Andrew Goldberg, Xiaojin Zhu, Aarti Singh, Zhiting Xu, and Robert Nowak. Multi-manifold semi-supervised learning. InArtificial intelligence and statistics, pages 169–176. PMLR, 2009
work page 2009
-
[26]
Hélène Guérin, Dinh-Toan Nguyen, and Viet-Chi Tran. Strong uniform convergence of Lapla- cians of random geometric and directed kNN graphs on compact manifolds.arXiv preprint arXiv:2212.10287, 2022. 55
-
[27]
Matthias Hein, Jean-Yves Audibert, and Ulrike von Luxburg. Graph Laplacians and their convergence on random neighborhood graphs.Journal of Machine Learning Research, 8(6), 2007
work page 2007
-
[28]
From graphs to manifolds– weak and strong pointwise consistency of graph Laplacians
Matthias Hein, Jean-Yves Audibert, and Ulrike Von Luxburg. From graphs to manifolds– weak and strong pointwise consistency of graph Laplacians. InInternational Conference on Computational Learning Theory, pages 470–485. Springer, 2005
work page 2005
-
[29]
Franca Hoffmann, Bamdad Hosseini, Assad A Oberai, and Andrew M Stuart. Spectral analysis of weighted Laplacians arising in data clustering.Applied and Computational Harmonic Analysis, 56:189–249, 2022
work page 2022
-
[30]
LDLE: Low distortion local eigenmaps
Dhruv Kohli, Alexander Cloninger, and Gal Mishne. LDLE: Low distortion local eigenmaps. Journal of machine learning research, 22(282):1–64, 2021
work page 2021
-
[31]
YP Mack. Rate of strong uniform convergence of k-NN density estimates.Journal of statistical planning and inference, 8(2):185–192, 1983
work page 1983
-
[32]
Multivariate k-nearest neighbor density estimates.Journal of Multivariate Analysis, 9(1):1–15, 1979
YP Mack and Murray Rosenblatt. Multivariate k-nearest neighbor density estimates.Journal of Multivariate Analysis, 9(1):1–15, 1979
work page 1979
-
[33]
Markus Maier, Matthias Hein, and Ulrike Von Luxburg. Optimal construction of k-nearest- neighbor graphs for identifying noisy clusters.Theoretical Computer Science, 410(19):1749–1764, 2009
work page 2009
-
[34]
Consistency properties of nearest neighbor density function estimators
David S Moore and James W Yackel. Consistency properties of nearest neighbor density function estimators. The Annals of Statistics, 5(1):143–154, 1977
work page 1977
-
[35]
Minimax-optimal semi-supervised regression on unknown manifolds
Amit Moscovich, Ariel Jaffe, and Nadler Boaz. Minimax-optimal semi-supervised regression on unknown manifolds. InArtificial Intelligence and Statistics, pages 933–942. PMLR, 2017
work page 2017
-
[36]
A Martina Neuman. Graph Laplacians on shared nearest neighbor graphs and graph Laplacians on k-nearest neighbor graphs having the same limit.arXiv preprint arXiv:2302.12399, 2023
-
[37]
On spectral clustering: Analysis and an algorithm
Andrew Ng, Michael Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 14, 2001
work page 2001
- [38]
-
[39]
Semi-supervised Fréchet regression.arXiv preprint arXiv:2404.10444, 2024
Rui Qiu, Zhou Yu, and Zhenhua Lin. Semi-supervised Fréchet regression.arXiv preprint arXiv:2404.10444, 2024
-
[40]
Chao Shen and Hau-Tieng Wu. Scalability and robustness of spectral embedding: landmark diffusion is all you need.Information and Inference: A Journal of the IMA, 11(4):1527–1595, 2022
work page 2022
-
[41]
Amit Singer. From graph to manifold Laplacian: The convergence rate.Applied and Computa- tional Harmonic Analysis, 21(1):128–134, 2006
work page 2006
-
[42]
Spectral convergence of the connection Laplacian from random samples
Amit Singer and Hau-Tieng Wu. Spectral convergence of the connection Laplacian from random samples. Information and Inference: A Journal of the IMA, 6(1):58–123, 2017
work page 2017
-
[43]
Analysis ofp-Laplacian regularization in semisupervised learning
Dejan Slepcev and Matthew Thorpe. Analysis ofp-Laplacian regularization in semisupervised learning. SIAM Journal on Mathematical Analysis, 51(3):2085–2120, 2019. 56
work page 2085
-
[44]
Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R Coifman. Diffusion maps for signal processing: A deeper look at manifold-learning techniques based on kernels and graphs.IEEE signal processing magazine, 30(4):75–86, 2013
work page 2013
-
[45]
Tao Tang, Nan Wu, Xiuyuan Cheng, and David Dunson. Adaptive bayesian regression on data with low intrinsic dimensionality.arXiv preprint arXiv:2407.09286, 2024
-
[46]
Joshua B Tenenbaum, Vin de Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction.Science, 290(5500):2319–2323, 2000
work page 2000
-
[47]
An analysis of the convergence of graph Laplacians
Daniel Ting, Ling Huang, and Michael I Jordan. An analysis of the convergence of graph Laplacians. In Proceedings of the 27th International Conference on International Conference on Machine Learning, pages 1079–1086, 2010
work page 2010
-
[48]
Nicolás García Trillos, Anna Little, Daniel McKenzie, and James M Murphy. Fermat distances: Metric approximation, spectral convergence, and clustering algorithms.Journal of Machine Learning Research, 25(176):1–65, 2024
work page 2024
-
[49]
A tutorial on spectral clustering.Statistics and computing, 17:395–416, 2007
Ulrike Von Luxburg. A tutorial on spectral clustering.Statistics and computing, 17:395–416, 2007
work page 2007
-
[50]
Hau-Tieng Wu and Nan Wu. Think globally, fit locally under the manifold setup: Asymptotic analysis of locally linear embedding.The Annals of Statistics, 46(6B):3805 – 3837, 2018
work page 2018
-
[51]
Self-tuning spectral clustering
Lihi Zelnik-Manor and Pietro Perona. Self-tuning spectral clustering. Advances in neural information processing systems, 17, 2004
work page 2004
-
[52]
Analysis of kNN density estimation.IEEE Transactions on Information Theory, 68(12):7971–7995, 2022
Puning Zhao and Lifeng Lai. Analysis of kNN density estimation.IEEE Transactions on Information Theory, 68(12):7971–7995, 2022
work page 2022
-
[53]
Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. InProceedings of the 20th International conference on Machine learning (ICML-03), pages 912–919, 2003. Appendix A Algorithm for general k0 in section 7 In this section, we detail how the algorithm described in Section 7 can be adapte...
work page 2003
-
[54]
Analyze the bias error of the independent sums ¯D(x) in the convergence analysis of the random-walk graph LaplaciansLrw and eLrw, where ¯D(x) is defined as in(33) in the proof of Theorem 6.1 and as in(A.161) in the proof of Theorem 6.4, respectively. The expectation of the former sum, which is its population counterpart, is ϵ−d/2 Z M k0 ∥x − y∥2 ϵϕ(¯ρr(x)...
-
[55]
Provide upper bounds for the expectations and variances of independent sums of random variables. Specifically, these expectations and variances can be upper bounded by the kernel integrals in the form of ϵ−d/2 Z M k0 ∥x − y∥2 ϵ max{¯ρr(x), ¯ρr(y)}2 ¯ρr(y)ip(y)dV (y), i = −4, −2, −1, . . . ,2. See (96) and (99) in the proof of Proposition 4.1 for an exampl...
-
[56]
Expand kernel integrals of the form ϵ−d/2 Z M h1 ∥x − y∥2 ϵϕ(¯ρr(x), ¯ρr(y))2 ; ερ,k ¯ρr(y)ip(y)dV (y), i = −2, −1, 0, 1, where h1(η; ερ,k)is as defined in Lemma E.1. This is used in the proofs to bound the replacement errors when k0 is compactly supported, specifically to bound the independent sums containing h1(η; ερ,k) in the bounds given in Lemmas E.1...
-
[57]
Since A ⊂ B2cmax ¯ρr(x)√ϵ(0), we have that A0 = A ∩ ˜A
This implies that the set ˜A := u ∈ B2cmax ¯ρr(x)√ϵ(0), u(1 − 1 2 ∇ ˜βx(0) · u) ≤ ¯ρr(x)√ϵ (A.68) is isomorphic to ad-dimensional ball. Since A ⊂ B2cmax ¯ρr(x)√ϵ(0), we have that A0 = A ∩ ˜A. (A.69) This proves the claim. For˜A, we have ˜A = {tθ, θ ∈ Sd−1, t ∈ [0, ˜rϵ,x(θ)]}, where ˜rϵ,x(θ) = ¯ρr(x)√ϵ + 1 2 ∇ ˜βx(0) · θ¯ρr(x)2ϵ + O[p](ϵ3/2). By (A.69), A0...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.