pith. sign in

arxiv: 2502.10600 · v4 · submitted 2025-02-14 · 📊 stat.ML · cs.LG· cs.NA· math.NA

Weighted quantization using MMD: From mean field to mean shift via gradient flows

Pith reviewed 2026-05-23 02:45 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NAmath.NA
keywords MMD quantizationmean shiftWasserstein-Fisher-Rao gradient flowinteracting particlesclusteringoptimal quantizationkernel methodsfixed-point iteration
0
0 comments X

The pith

A Wasserstein-Fisher-Rao gradient flow on measures, discretized by interacting particles, produces the MSIP algorithm for MMD-optimal weighted quantization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to find a weighted collection of Dirac particles that best matches a target distribution when error is measured by maximum mean discrepancy. It argues that the natural dynamics for this task is a Wasserstein-Fisher-Rao gradient flow and shows that the flow admits an exact discretization as a system of ordinary differential equations for interacting particles. From these ODEs the authors extract a fixed-point iteration called mean shift interacting particles. This iteration is shown to recover the classical mean shift procedure as a special case, to act as a preconditioned gradient step, and to relax Lloyd's algorithm when used for clustering. Numerical tests indicate that the resulting procedures remain stable in high dimensions and on multimodal targets where earlier methods degrade.

Core claim

The Wasserstein-Fisher-Rao gradient flow minimizes MMD between a target probability measure and a weighted atomic measure; its particle discretization yields ordinary differential equations whose equilibria satisfy an extended mean-shift fixed-point equation that simultaneously generalizes mode-finding in kernel density estimation and relaxes Lloyd iteration for clustering.

What carries the argument

The Wasserstein-Fisher-Rao gradient flow discretized into a system of interacting-particle ODEs whose fixed-point iteration is the mean shift interacting particles (MSIP) algorithm.

If this is right

  • MSIP recovers the classical mean shift update when all particle weights are forced equal.
  • MSIP can be rewritten as preconditioned gradient descent on the MMD objective.
  • MSIP functions as a relaxation of Lloyd's algorithm when applied to clustering tasks.
  • The particle discretization inherits the robustness properties observed for the underlying gradient flow in high-dimensional and multimodal regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same particle ODE discretization could be applied to other discrepancy measures whose gradient flows admit similar mean-field descriptions.
  • Variable particle weights arising from the flow may improve mode recovery in kernel density estimation compared with uniform-weight mean shift.
  • Because MSIP is a relaxation of Lloyd iteration, its convergence rate on finite mixtures may be governed by the same contraction arguments used for k-means.

Load-bearing premise

That the Wasserstein-Fisher-Rao gradient flow on measures can be discretized into stable particle ODEs and a convergent fixed-point iteration without further conditions on the kernel or the target distribution.

What would settle it

A concrete counter-example in which the MSIP fixed-point iteration either diverges or converges to a weighted particle set whose MMD distance to the target exceeds that achieved by standard mean-shift or Lloyd methods on the same data.

Figures

Figures reproduced from arXiv: 2502.10600 by Ayoub Belhadji, Daniel Sharp, Youssef Marzouk.

Figure 1
Figure 1. Figure 1: Comparison of quantization algorithms on a joker distribution, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of different quantization algorithms on a GMM. (Left): dimension [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparing quantizations of MNIST We now illustrate our algorithms using the MNIST dataset [60]; for further results, see Appendix A.6.2. We compare MSIP, Lloyd’s algorithm, WFR, IFTflow, MMDGF, DMGD, and classical (non-interacting) mean shift (IIDMS). When Lloyd’s algorithm pro￾duces an empty Voronoi cell, we make the correspond￾ing particle retain its position [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: First five univariate and pairwise marginals of the 100-dimensional distribution used in [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Trajectories of four algorithms started at two different intializations (yellow and red [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of the dynamics of mean shift and the discretization of MMD gradient flow [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of four algorithms on MNIST for the iteration [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of different algorithms’ quantization of MNIST with [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Final configuration of MSIP and WFR-IPS compared to Lloyd’s algorithm with identical [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Weights of WFR-IPS trajectories, marginalizing out time: The weights are sorted at [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Weights of MSIP final configurations. The weights increase from left to right (ordering statistic subscript [j] is the jth smallest). (Top): Checkers target. (Bottom): Rings target. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
read the original abstract

Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures that best approximates the target distribution. While much existing work relies on the Wasserstein distance to quantify approximation errors, maximum mean discrepancy (MMD) has received comparatively less attention, especially when allowing for variable particle weights. We argue that a Wasserstein-Fisher-Rao gradient flow is well-suited for designing quantizations optimal under MMD. We show that a system of interacting particles satisfying a set of ODEs discretizes this flow. We further derive a new fixed-point algorithm called mean shift interacting particles (MSIP). We show that MSIP extends the classical mean shift algorithm, widely used for identifying modes in kernel density estimators. Moreover, we show that MSIP can be interpreted as preconditioned gradient descent and that it acts as a relaxation of Lloyd's algorithm for clustering. Our unification of gradient flows, mean shift, and MMD-optimal quantization yields algorithms that are more robust than state-of-the-art methods, as demonstrated via high-dimensional and multi-modal numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper argues that a Wasserstein-Fisher-Rao gradient flow is well-suited for MMD-optimal weighted quantization of a target distribution. It shows that a system of interacting particles obeying a set of ODEs discretizes this flow, derives the mean shift interacting particles (MSIP) fixed-point algorithm from it, and claims that MSIP extends the classical mean shift algorithm, can be viewed as preconditioned gradient descent, and acts as a relaxation of Lloyd's algorithm. High-dimensional and multi-modal experiments are presented to demonstrate greater robustness than existing methods.

Significance. If the derivations are correct, the work supplies a principled gradient-flow route from MMD quantization to a practical fixed-point iteration that recovers and extends mean shift, offering a new algorithmic unification with potential advantages for clustering and particle-based approximation.

minor comments (2)
  1. [Abstract] Abstract: the statement that the ODE system 'discretizes this flow' and that MSIP 'extends' mean shift would benefit from a one-sentence pointer to the precise discretization scheme and the sense in which the extension holds (e.g., recovery of the classical update when weights are uniform).
  2. [Abstract] The manuscript should clarify whether any restrictions on the kernel or target measure are required for the WFR flow to remain well-defined and for the MSIP iteration to preserve the claimed optimality properties.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its significance, and recommendation of minor revision. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central chain—from Wasserstein-Fisher-Rao gradient flow on measures, through interacting-particle ODE discretization, to the MSIP fixed-point iteration—is presented as a forward derivation that produces new algorithms (extensions of mean shift, relaxation of Lloyd). No equation or claim reduces a claimed prediction or result to a quantity defined by the same fitted parameters or by self-citation; the abstract and provided text contain no self-referential definitions, fitted-input renamings, or load-bearing uniqueness theorems imported from the authors' prior work. The derivation is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, background axioms, or newly postulated entities; full manuscript would be required to populate the ledger.

pith-pipeline@v0.9.0 · 5750 in / 1233 out tokens · 31844 ms · 2026-05-23T02:45:55.751869+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Stationary MMD Points

    stat.ML 2025-05 unverdicted novelty 7.0

    Stationary MMD points show super-convergence in integration error over MMD for RKHS integrands, and MMD gradient flows compute them with a new non-asymptotic finite-particle error bound.

  2. A note on the unique properties of the Kullback--Leibler divergence for sampling via gradient flows

    stat.ME 2025-07 unverdicted novelty 6.0

    The Kullback-Leibler divergence is the only Bregman divergence whose gradient flow with respect to many popular metrics does not require the normalizing constant of the target distribution π.

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · cited by 2 Pith papers · 26 internal anchors

  1. [1]

    Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels

    F. Altekr¨ uger, J. Hertrich, and G. Steidl, “Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels”, ICML, 2023 arXiv:2301.11624

  2. [2]

    Gradient flows: in metric spaces and in the space of probability measures

    L. Ambrosio, N. Gigli, and G. Savar´ e, “Gradient flows: in metric spaces and in the space of probability measures”, Springer Science & Business Media, 2008

  3. [3]

    Maximum mean discrepancy gradient flow

    M. Arbel, A. Korba, A. Salim, and A. Gretton, “Maximum mean discrepancy gradient flow”, Advances in Neural Information Processing Systems 32 (2019) arXiv:1906.04370

  4. [4]

    On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm

    E. Arias-Castro, D. Mason, and B. Pelletier, “On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm”, Journal of Machine Learning Research 17 (2016), no. 206, 1–4

  5. [5]

    On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions

    F. Bach, “On the equivalence between kernel quadrature rules and random feature expansions”, The Journal of Machine Learning Research 18 (2017), no. 1, 714–751, arXiv:1502.06800

  6. [6]

    Kernel quadrature with DPPs

    A. Belhadji, R. Bardenet, and P. Chainais, “Kernel quadrature with DPPs”, Advances in Neural Information Processing Systems 32 (2019) 12907–12917, arXiv:1906.07832

  7. [7]

    Kernel interpolation with continuous volume sampling

    A. Belhadji, R. Bardenet, and P. Chainais, “Kernel interpolation with continuous volume sampling”, Proceedings of the 37th International Conference on Machine Learning , 2020 725–735, arXiv:2002.09677

  8. [8]

    An analysis of Ermakov–Zolotukhin quadrature using kernels

    A. Belhadji, “An analysis of Ermakov–Zolotukhin quadrature using kernels”, Advances in Neural Information Processing Systems 34 (2021) 27278–27289, arXiv:2309.01200

  9. [9]

    Sketch and shift: a robust decoder for compressive clustering

    A. Belhadji and R. Gribonval, “Sketch and shift: a robust decoder for compressive clustering”, Transactions on Machine Learning Research, 2024 arXiv:2312.09940

  10. [10]

    Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees

    F.-X. Briol, C. Oates, M. Girolami, and M. A. Osborne, “Frank-Wolfe Bayesian quadrature: Probabilistic integration with theoretical guarantees”, Advances in Neural Information Processing Systems 28 (2015) arXiv:1506.02681

  11. [11]

    Gaussian mean-shift is an EM algorithm

    M. A. Carreira-Perpinan, “Gaussian mean-shift is an EM algorithm”, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007), no. 5, 767–776

  12. [12]

    A review of mean-shift algorithms for clustering

    M. A. Carreira-Perpin´ an, “A review of mean-shift algorithms for clustering”, arXiv preprint, 2015 arXiv:1503.00687

  13. [13]

    A blob method for diffusion

    J. A. Carrillo, K. Craig, and F. S. Patacchini, “A blob method for diffusion”, Calculus of Variations and Partial Differential Equations 58 (2019) 1–53, arXiv:1709.09195

  14. [14]

    Efficient numerical integration in reproducing kernel Hilbert spaces via leverage scores sampling

    A. Chatalic, N. Schreuder, E. De Vito, and L. Rosasco, “Efficient numerical integration in reproducing kernel Hilbert spaces via leverage scores sampling”, arXiv preprint, 2023 arXiv:2311.13548

  15. [15]

    Stein Points

    W. Chen, L. Mackey, J. Gorham, F. Briol, and C. Oates, “Stein points”, in “Proceedings of the 35th International Conference on Machine Learning”, J. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, pp. 844–853. PMLR, 10–15 Jul 2018. arXiv:1803.10161. 13

  16. [16]

    Mean shift, mode seeking, and clustering

    Y. Cheng, “Mean shift, mode seeking, and clustering”, IEEE transactions on pattern analysis and machine intelligence 17 (1995), no. 8, 790–799

  17. [17]

    Chewi, J

    S. Chewi, J. Niles-Weed, and P. Rigollet, “Statistical optimal transport”, arXiv preprint, 2024 arXiv:2407.18163

  18. [18]

    SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

    S. Chewi, T. Le Gouic, C. Lu, T. Maunu, and P. Rigollet, “SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence”, Advances in Neural Information Processing Systems 33 (2020) 2098–2109, arXiv:2006.02509

  19. [19]

    Bandwidth selection for kernel density estimation

    S.-T. Chiu, “Bandwidth selection for kernel density estimation”, The Annals of Statistics , 1991 1883–1905

  20. [20]

    On lazy training in differentiable programming

    L. Chizat, E. Oyallon, and F. Bach, “On lazy training in differentiable programming”, Advances in neural information processing systems 32 (2019) arXiv:1812.07956

  21. [21]

    An Interpolating Distance between Optimal Transport and Fisher-Rao

    L. Chizat, G. Peyr´ e, B. Schmitzer, and F. Vialard, “An interpolating distance between optimal transport and Fisher–Rao metrics”, Foundations of Computational Mathematics 18 (2018) 1–44, arXiv:1506.06430

  22. [22]

    Sparse optimization on measures with over-parameterized gradient descent

    L. Chizat, “Sparse optimization on measures with over-parameterized gradient descent”, Mathematical Programming 194 (2022), no. 1, 487–532, arXiv:1907.10300

  23. [23]

    Mean shift: A robust approach toward feature space analysis

    D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis”, IEEE Transactions on pattern analysis and machine intelligence 24 (2002), no. 5, 603–619

  24. [24]

    A Blob Method for the Aggregation Equation

    K. Craig and A. Bertozzi, “A blob method for the aggregation equation”, Mathematics of computation 85 (2016), no. 300, 1681–1717, arXiv:1405.6424

  25. [25]

    Exact Reconstruction using Beurling Minimal Extrapolation

    Y. De Castro and F. Gamboa, “Exact reconstruction using beurling minimal extrapolation”, Journal of Mathematical Analysis and applications 395 (2012), no. 1, 336–354, arXiv:1103.4951

  26. [26]

    On optimal center locations for radial basis function interpolation: computational aspects

    S. De M., “On optimal center locations for radial basis function interpolation: computational aspects”, Rend. Splines Radial Basis Functions and Applications 61 (2003), no. 3, 343–358

  27. [27]

    Near-optimal data-independent point locations for radial basis function interpolation

    S. De M., R. Schaback, and H. Wendland, “Near-optimal data-independent point locations for radial basis function interpolation”, Advances in Computational Mathematics 23 (2005) 317–330

  28. [28]

    Centroidal Voronoi tessellations: Applications and algorithms

    Q. Du, V. Faber, and M. Gunzburger, “Centroidal Voronoi tessellations: Applications and algorithms”, SIAM review 41 (1999), no. 4, 637–676

  29. [29]

    Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations

    Q. Du, M. Emelianenko, and L. Ju, “Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations”, SIAM journal on numerical analysis 44 (2006), no. 1, 102–119

  30. [30]

    Generalized kernel thinning

    R. Dwivedi and L. Mackey, “Generalized kernel thinning”, in “The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022”. OpenReview.net, 2022. arXiv:2110.01593

  31. [31]

    Kernel thinning

    R. Dwivedi and L. Mackey, “Kernel thinning”, Journal of Machine Learning Research 25 (2024), no. 152, 1–77, arXiv:2105.05842. 14

  32. [32]

    Training generative neural networks via Maximum Mean Discrepancy optimization

    G. K. Dziugaite, D. M. Roy, and Z. Ghahramani, “Training generative neural networks via maximum mean discrepancy optimization”, arXiv preprint, 2015 arXiv:1505.03906

  33. [33]

    Optimal Monte Carlo integration on closed manifolds

    M. Ehler, M. Gr¨ af, and C. J. Oates, “Optimal Monte Carlo integration on closed manifolds”, Statistics and Computing 29 (2019), no. 6, 1203–1214, arXiv:1707.04723

  34. [34]

    Nondegeneracy and weak global convergence of the Lloyd algorithm in Rd

    M. Emelianenko, L. Ju, and A. Rand, “Nondegeneracy and weak global convergence of the Lloyd algorithm in Rd”, SIAM Journal on Numerical Analysis 46 (2008), no. 3, 1423–1441

  35. [35]

    Kernel quadrature with randomly pivoted cholesky.arXiv preprint arXiv:2306.03955,

    E. Epperly and E. Moreno, “Kernel quadrature with randomly pivoted Cholesky”, Advances in Neural Information Processing Systems 36 (2023) 65850–65868, arXiv:2306.03955

  36. [36]

    The estimation of the gradient of a density function, with applications in pattern recognition

    K. Fukunaga and L. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition”, IEEE Transactions on information theory 21 (1975), no. 1, 32–40

  37. [37]

    A JKO splitting scheme for Kantorovich-Fisher-Rao gradient flows

    T. O. Gallou¨ et and L. Monsaingeon, “A JKO splitting scheme for Kantorovich–Fisher–Rao gradient flows”, SIAM Journal on Mathematical Analysis 49 (2017), no. 2, 1100–1130, arXiv:1602.04457

  38. [38]

    On the Convergence of the Mean Shift Algorithm in the One-Dimensional Space

    Y. A. Ghassabeh, “On the convergence of the mean shift algorithm in the one-dimensional space”, Pattern Recognition Letters 34 (2013), no. 12, 1423–1427, arXiv:1407.2961

  39. [39]

    A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel

    Y. A. Ghassabeh, “A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel”, Journal of Multivariate Analysis 135 (2015) 1–10

  40. [40]

    Interaction-force transport gradient flows

    E. Gladin, P. Dvurechensky, A. Mielke, and J.-J. Zhu, “Interaction-force transport gradient flows”, in “Advances in Neural Information Processing Systems”, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, eds., vol. 37, pp. 14484–14508. Curran Associates, Inc., 2024. arXiv:2405.17075

  41. [41]

    KALE flow: A relaxed KL gradient flow for probabilities with disjoint support

    P. Glaser, M. Arbel, and A. Gretton, “KALE flow: A relaxed KL gradient flow for probabilities with disjoint support”, Advances in Neural Information Processing Systems 34 (2021) 8018–8031, arXiv:2106.08929

  42. [42]

    Foundations of quantization for probability distributions

    S. Graf and H. Luschgy, “Foundations of quantization for probability distributions”, Springer Science & Business Media, 2000

  43. [43]

    A kernel statistical test of independence

    A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Sch¨ olkopf, and A. Smola, “A kernel statistical test of independence”, Advances in neural information processing systems 20 (2007)

  44. [44]

    A kernel two-sample test

    A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch¨ olkopf, and A. Smola, “A kernel two-sample test”, The Journal of Machine Learning Research 13 (2012), no. 1, 723–773

  45. [45]

    Positively weighted kernel quadrature via subsampling

    S. Hayakawa, H. Oberhauser, and T. Lyons, “Positively weighted kernel quadrature via subsampling”, Advances in Neural Information Processing Systems 35 (2022) 6886–6900, arXiv:2107.09597

  46. [46]

    Sampling-based Nystr¨ om approximation and kernel quadrature

    S. Hayakawa, H. Oberhauser, and T. Lyons, “Sampling-based Nystr¨ om approximation and kernel quadrature”, in “International Conference on Machine Learning”, pp. 12678–12699, PMLR. 2023. 15

  47. [47]

    Generative sliced MMD flows with Riesz kernels

    J. Hertrich, C. Wald, F. Altekr¨ uger, and P. Hagemann, “Generative sliced MMD flows with Riesz kernels”, in “The Twelfth International Conference on Learning Representations”. 2024. arXiv:2305.11463

  48. [48]

    Optimally-Weighted Herding is Bayesian Quadrature

    F. Husz´ ar and D. Duvenaud, “Optimally–weighted herding is Bayesian quadrature”, arXiv preprint, 2012 arXiv:1204.1664

  49. [49]

    The variational formulation of the Fokker–Planck equation

    R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulation of the Fokker–Planck equation”, SIAM journal on mathematical analysis 29 (1998), no. 1, 1–17

  50. [50]

    Fully symmetric kernel quadrature

    T. Karvonen and S. S¨ arkk¨ a, “Fully symmetric kernel quadrature”,SIAM Journal on Scientific Computing 40 (2018), no. 2, A697–A720, arXiv:1703.06359

  51. [51]

    Gaussian kernel quadrature at scaled Gauss-Hermite nodes

    T. Karvonen and S. S¨ arkk¨ a, “Gaussian kernel quadrature at scaled Gauss–Hermite nodes”, BIT Numerical Mathematics 59 (2019), no. 4, 877–902, arXiv:1803.09532

  52. [52]

    Kernel-based interpolation at approximate Fekete points

    T. Karvonen, S. S¨ arkk¨ a, and K. Tanaka, “Kernel-based interpolation at approximate Fekete points”, Numerical Algorithms 87 (2021) 445–468, arXiv:1912.07316

  53. [53]

    On the positivity and magnitudes of Bayesian quadrature weights

    T. Karvonen, M. Kanagawa, and S. S¨ arkk¨ a, “On the positivity and magnitudes of Bayesian quadrature weights”, Statistics and Computing 29 (2019) 1317–1333, arXiv:1812.08509

  54. [54]

    Numerical methods for nonlinear equations

    C. T. Kelley, “Numerical methods for nonlinear equations”, Acta Numerica 27 (2018) 207–287

  55. [55]

    Exponential rate of convergence for Lloyd’s method I

    J. Kieffer, “Exponential rate of convergence for Lloyd’s method I”, IEEE Transactions on Information Theory 28 (1982), no. 2, 205–210

  56. [56]

    A new optimal transport distance on the space of finite Radon measures

    S. Kondratyev, L. Monsaingeon, and D. Vorotnikov, “A new optimal transport distance on the space of finite Radon measures”, Advances in Differential Equations 21 November (2016) arXiv:1505.07746

  57. [57]

    Kernel Stein discrepancy descent

    A. Korba, P. Aubin-Frankowski, S. Majewski, and P. Ablin, “Kernel Stein discrepancy descent”, in “International Conference on Machine Learning”, pp. 5719–5730, PMLR. 2021. arXiv:2105.09994

  58. [58]

    Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

    S. Lacoste-Julien, F. Lindsten, and F. Bach, “Sequential kernel herding: Frank-Wolfe optimization for particle filtering”, in “Artificial Intelligence and Statistics”, pp. 544–552, PMLR. 2015. arXiv:1501.02056

  59. [59]

    Numba: A LLVM-based Python JIT compiler

    S. K. Lam, A. Pitrou, and S. Seibert, “Numba: A LLVM-based Python JIT compiler”, in “Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC”, pp. 1–6. 2015

  60. [60]

    MNIST handwritten digit database

    Y. LeCun, C. Cortes, and C. Burges, “MNIST handwritten digit database”, ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010)

  61. [61]

    MMD GAN: Towards Deeper Understanding of Moment Matching Network

    C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. P´ oczos, “MMD GAN: Towards deeper understanding of moment matching network”, Advances in neural information processing systems 30 (2017) arXiv:1705.08584

  62. [62]

    A note on the convergence of the mean shift

    X. Li, Z. Hu, and F. Wu, “A note on the convergence of the mean shift”, Pattern recognition 40 (2007), no. 6, 1756–1762. 16

  63. [63]

    Optimal Entropy-Transport problems and a new Hellinger-Kantorovich distance between positive measures

    M. Liero, A. Mielke, and G. Savar´ e, “Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures”, Inventiones mathematicae 211 (2018), no. 3, 969–1117, arXiv:1508.07941

  64. [64]

    Stein variational gradient descent: A general purpose Bayesian inference algorithm

    Q. Liu and D. Wang, “Stein variational gradient descent: A general purpose Bayesian inference algorithm”, Advances in neural information processing systems 29 (2016) arXiv:1608.04471

  65. [65]

    Birth–death dynamics for sampling: global convergence, approximations and their asymptotics

    Y. Lu, D. Slepˇ cev, and L. Wang, “Birth–death dynamics for sampling: global convergence, approximations and their asymptotics”, Nonlinearity 36 (2023), no. 11, 5731, arXiv:2211.00450

  66. [66]

    Accelerating Langevin Sampling with Birth-death

    Y. Lu, J. Lu, and J. Nolen, “Accelerating Langevin sampling with birth-death”, arXiv preprint, 2019 arXiv:1905.09863

  67. [67]

    Sampling in unit time with kernel Fisher–Rao flow

    A. Maurais and Y. Marzouk, “Sampling in unit time with kernel Fisher–Rao flow”, in “Proceedings of the 41st International Conference on Machine Learning”, vol. 235 of Proceedings of Machine Learning Research, pp. 35138–35162. PMLR, 21–27 Jul 2024. arXiv:2401.03892

  68. [68]

    Kernel mean embedding of distributions: A review and beyond

    K. Muandet, K. Fukumizu, B. Sriperumbudur, B. Sch¨ olkopf,et al., “Kernel mean embedding of distributions: A review and beyond”, Foundations and Trends® in Machine Learning 10 (2017), no. 1-2, 1–141, arXiv:1605.09522

  69. [69]

    Slice sampling

    R. M. Neal, “Slice sampling”, The Annals of Statistics 31 June (2003)

  70. [70]

    Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification

    J. Oettershagen, “Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification”, Verlag Dr. Hut, 2017

  71. [71]

    The geometry of dissipative evolution equations: the porous medium equation

    F. Otto, “The geometry of dissipative evolution equations: the porous medium equation”, Communications in Partial Differential Equations , 2001

  72. [72]

    Statistically efficient thinning of a Markov chain sampler

    A. B. Owen, “Statistically efficient thinning of a Markov chain sampler”, Journal of Computational and Graphical Statistics 26 (2017), no. 3, 738–744, arXiv:1510.07727

  73. [73]

    Pointwise convergence of the Lloyd algorithm in higher dimension

    G. Pag` es and J. Yu, “Pointwise convergence of the Lloyd algorithm in higher dimension”, SIAM Journal on Control and Optimization 54 (2016), no. 5, 2354–2382, arXiv:1401.0192

  74. [74]

    Computational optima l transport

    G. Peyr´ e and M. Cuturi, “Computational optimal transport: With applications to data science”, Foundations and Trends® in Machine Learning 11 (2019), no. 5-6, 355–607, arXiv:1803.00567

  75. [75]

    n-Widths in Approximation Theory

    A. Pinkus, “n-Widths in Approximation Theory”, Springer Science & Business Media, 2012

  76. [76]

    On the sequential convergence of Lloyd's algorithms

    L. Portales, E. Cazelles, and E. Pauwels, “On the sequential convergence of Lloyd’s algorithms”, arXiv preprint, 2024 arXiv:2405.20744

  77. [77]

    Interactive supercomputing on 40,000 cores for machine learning and data analysis

    A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Bergeron, V. Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin, J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas, “Interactive supercomputing on 40,000 cores for machine learning and data analysis”, in “2018 IEEE High Performance extreme Computing Conference (HPEC)”, p...

  78. [78]

    Optimal thinning of MCMC output

    M. Riabiz, W. Y. Chen, J. Cockayne, P. Swietach, S. A. Niederer, L. Mackey, and C. J. Oates, “Optimal thinning of MCMC output”, Journal of the Royal Statistical Society Series B: Statistical Methodology 84 (2022), no. 4, 1059–1081, arXiv:2005.03952. 17

  79. [79]

    Monte Carlo statistical methods

    C. P. Robert, G. Casella, and G. Casella, “Monte Carlo statistical methods”, Springer, 1999

  80. [80]

    Global convergence of neuron birth-death dynamics

    G. Rotskoff, S. Jelassi, J. Bruna, and E. Vanden-Eijnden, “Global convergence of neuron birth-death dynamics”, in “International Conference on Machine Learning”. 2019. arXiv:1902.01843

Showing first 80 references.