Weighted quantization using MMD: From mean field to mean shift via gradient flows

Ayoub Belhadji; Daniel Sharp; Youssef Marzouk

arxiv: 2502.10600 · v4 · submitted 2025-02-14 · 📊 stat.ML · cs.LG· cs.NA· math.NA

Weighted quantization using MMD: From mean field to mean shift via gradient flows

Ayoub Belhadji , Daniel Sharp , Youssef Marzouk This is my paper

Pith reviewed 2026-05-23 02:45 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NAmath.NA

keywords MMD quantizationmean shiftWasserstein-Fisher-Rao gradient flowinteracting particlesclusteringoptimal quantizationkernel methodsfixed-point iteration

0 comments

The pith

A Wasserstein-Fisher-Rao gradient flow on measures, discretized by interacting particles, produces the MSIP algorithm for MMD-optimal weighted quantization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to find a weighted collection of Dirac particles that best matches a target distribution when error is measured by maximum mean discrepancy. It argues that the natural dynamics for this task is a Wasserstein-Fisher-Rao gradient flow and shows that the flow admits an exact discretization as a system of ordinary differential equations for interacting particles. From these ODEs the authors extract a fixed-point iteration called mean shift interacting particles. This iteration is shown to recover the classical mean shift procedure as a special case, to act as a preconditioned gradient step, and to relax Lloyd's algorithm when used for clustering. Numerical tests indicate that the resulting procedures remain stable in high dimensions and on multimodal targets where earlier methods degrade.

Core claim

The Wasserstein-Fisher-Rao gradient flow minimizes MMD between a target probability measure and a weighted atomic measure; its particle discretization yields ordinary differential equations whose equilibria satisfy an extended mean-shift fixed-point equation that simultaneously generalizes mode-finding in kernel density estimation and relaxes Lloyd iteration for clustering.

What carries the argument

The Wasserstein-Fisher-Rao gradient flow discretized into a system of interacting-particle ODEs whose fixed-point iteration is the mean shift interacting particles (MSIP) algorithm.

If this is right

MSIP recovers the classical mean shift update when all particle weights are forced equal.
MSIP can be rewritten as preconditioned gradient descent on the MMD objective.
MSIP functions as a relaxation of Lloyd's algorithm when applied to clustering tasks.
The particle discretization inherits the robustness properties observed for the underlying gradient flow in high-dimensional and multimodal regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same particle ODE discretization could be applied to other discrepancy measures whose gradient flows admit similar mean-field descriptions.
Variable particle weights arising from the flow may improve mode recovery in kernel density estimation compared with uniform-weight mean shift.
Because MSIP is a relaxation of Lloyd iteration, its convergence rate on finite mixtures may be governed by the same contraction arguments used for k-means.

Load-bearing premise

That the Wasserstein-Fisher-Rao gradient flow on measures can be discretized into stable particle ODEs and a convergent fixed-point iteration without further conditions on the kernel or the target distribution.

What would settle it

A concrete counter-example in which the MSIP fixed-point iteration either diverges or converges to a weighted particle set whose MMD distance to the target exceeds that achieved by standard mean-shift or Lloyd methods on the same data.

Figures

Figures reproduced from arXiv: 2502.10600 by Ayoub Belhadji, Daniel Sharp, Youssef Marzouk.

**Figure 2.** Figure 2: Comparison of different quantization algorithms on a GMM. (Left): dimension [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Comparing quantizations of MNIST We now illustrate our algorithms using the MNIST dataset [60]; for further results, see Appendix A.6.2. We compare MSIP, Lloyd’s algorithm, WFR, IFTflow, MMDGF, DMGD, and classical (non-interacting) mean shift (IIDMS). When Lloyd’s algorithm produces an empty Voronoi cell, we make the corresponding particle retain its position [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: First five univariate and pairwise marginals of the 100-dimensional distribution used in [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Trajectories of four algorithms started at two different intializations (yellow and red [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of the dynamics of mean shift and the discretization of MMD gradient flow [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of four algorithms on MNIST for the iteration [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of different algorithms’ quantization of MNIST with [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: Final configuration of MSIP and WFR-IPS compared to Lloyd’s algorithm with identical [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗

**Figure 10.** Figure 10: Weights of WFR-IPS trajectories, marginalizing out time: The weights are sorted at [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗

**Figure 11.** Figure 11: Weights of MSIP final configurations. The weights increase from left to right (ordering statistic subscript [j] is the jth smallest). (Top): Checkers target. (Bottom): Rings target. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗

read the original abstract

Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures that best approximates the target distribution. While much existing work relies on the Wasserstein distance to quantify approximation errors, maximum mean discrepancy (MMD) has received comparatively less attention, especially when allowing for variable particle weights. We argue that a Wasserstein-Fisher-Rao gradient flow is well-suited for designing quantizations optimal under MMD. We show that a system of interacting particles satisfying a set of ODEs discretizes this flow. We further derive a new fixed-point algorithm called mean shift interacting particles (MSIP). We show that MSIP extends the classical mean shift algorithm, widely used for identifying modes in kernel density estimators. Moreover, we show that MSIP can be interpreted as preconditioned gradient descent and that it acts as a relaxation of Lloyd's algorithm for clustering. Our unification of gradient flows, mean shift, and MMD-optimal quantization yields algorithms that are more robust than state-of-the-art methods, as demonstrated via high-dimensional and multi-modal numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives MSIP from an MMD gradient flow and shows it extends mean shift while relaxing Lloyd's, with supporting numerics.

read the letter

The main takeaway is that this work starts from a Wasserstein-Fisher-Rao gradient flow on measures to target MMD-optimal weighted quantization, discretizes it into particle ODEs, and arrives at a new fixed-point method called MSIP. The authors then show MSIP recovers mean shift as a special case and acts as a relaxed version of Lloyd's algorithm for clustering. That chain is the clearest new element. The unification is useful because it gives a single derivation route for methods that previously sat in separate literatures. The high-dimensional and multi-modal experiments add practical weight by suggesting the resulting algorithms hold up better than some baselines under those conditions. The gradient-flow starting point is a reasonable choice for MMD with variable weights, and the move from continuous flow to discrete iteration is laid out coherently. The positioning against existing particle and kernel work is straightforward and avoids overclaiming. One soft spot is that the abstract leaves the discretization error and stability details implicit, so the full text needs to confirm that the ODE system and fixed-point step preserve the desired optimality properties without hidden restrictions on the kernel or target. The robustness claim also rests on the specific experimental controls, which would benefit from explicit comparison tables or ablation checks. This paper is aimed at readers working on quantization, clustering, or mean-shift variants inside statistical ML. Anyone already using gradient flows or MMD for particle approximations will find the links worth reading. It deserves a serious referee because the derivation is self-contained and the algorithmic contribution is concrete enough to review on its merits.

Referee Report

0 major / 2 minor

Summary. The paper argues that a Wasserstein-Fisher-Rao gradient flow is well-suited for MMD-optimal weighted quantization of a target distribution. It shows that a system of interacting particles obeying a set of ODEs discretizes this flow, derives the mean shift interacting particles (MSIP) fixed-point algorithm from it, and claims that MSIP extends the classical mean shift algorithm, can be viewed as preconditioned gradient descent, and acts as a relaxation of Lloyd's algorithm. High-dimensional and multi-modal experiments are presented to demonstrate greater robustness than existing methods.

Significance. If the derivations are correct, the work supplies a principled gradient-flow route from MMD quantization to a practical fixed-point iteration that recovers and extends mean shift, offering a new algorithmic unification with potential advantages for clustering and particle-based approximation.

minor comments (2)

[Abstract] Abstract: the statement that the ODE system 'discretizes this flow' and that MSIP 'extends' mean shift would benefit from a one-sentence pointer to the precise discretization scheme and the sense in which the extension holds (e.g., recovery of the classical update when weights are uniform).
[Abstract] The manuscript should clarify whether any restrictions on the kernel or target measure are required for the WFR flow to remain well-defined and for the MSIP iteration to preserve the claimed optimality properties.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its significance, and recommendation of minor revision. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central chain—from Wasserstein-Fisher-Rao gradient flow on measures, through interacting-particle ODE discretization, to the MSIP fixed-point iteration—is presented as a forward derivation that produces new algorithms (extensions of mean shift, relaxation of Lloyd). No equation or claim reduces a claimed prediction or result to a quantity defined by the same fitted parameters or by self-citation; the abstract and provided text contain no self-referential definitions, fitted-input renamings, or load-bearing uniqueness theorems imported from the authors' prior work. The derivation is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, background axioms, or newly postulated entities; full manuscript would be required to populate the ledger.

pith-pipeline@v0.9.0 · 5750 in / 1233 out tokens · 31844 ms · 2026-05-23T02:45:55.751869+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MSIP extends the classical mean shift algorithm... acts as a relaxation of Lloyd’s algorithm

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Stationary MMD Points
stat.ML 2025-05 unverdicted novelty 7.0

Stationary MMD points show super-convergence in integration error over MMD for RKHS integrands, and MMD gradient flows compute them with a new non-asymptotic finite-particle error bound.
A note on the unique properties of the Kullback--Leibler divergence for sampling via gradient flows
stat.ME 2025-07 unverdicted novelty 6.0

The Kullback-Leibler divergence is the only Bregman divergence whose gradient flow with respect to many popular metrics does not require the normalizing constant of the target distribution π.

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · cited by 2 Pith papers · 26 internal anchors

[1]

Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels

F. Altekr¨ uger, J. Hertrich, and G. Steidl, “Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels”, ICML, 2023 arXiv:2301.11624

work page arXiv 2023
[2]

Gradient flows: in metric spaces and in the space of probability measures

L. Ambrosio, N. Gigli, and G. Savar´ e, “Gradient flows: in metric spaces and in the space of probability measures”, Springer Science & Business Media, 2008

work page 2008
[3]

Maximum mean discrepancy gradient flow

M. Arbel, A. Korba, A. Salim, and A. Gretton, “Maximum mean discrepancy gradient flow”, Advances in Neural Information Processing Systems 32 (2019) arXiv:1906.04370

work page arXiv 2019
[4]

On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm

E. Arias-Castro, D. Mason, and B. Pelletier, “On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm”, Journal of Machine Learning Research 17 (2016), no. 206, 1–4

work page 2016
[5]

On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions

F. Bach, “On the equivalence between kernel quadrature rules and random feature expansions”, The Journal of Machine Learning Research 18 (2017), no. 1, 714–751, arXiv:1502.06800

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Kernel quadrature with DPPs

A. Belhadji, R. Bardenet, and P. Chainais, “Kernel quadrature with DPPs”, Advances in Neural Information Processing Systems 32 (2019) 12907–12917, arXiv:1906.07832

work page arXiv 2019
[7]

Kernel interpolation with continuous volume sampling

A. Belhadji, R. Bardenet, and P. Chainais, “Kernel interpolation with continuous volume sampling”, Proceedings of the 37th International Conference on Machine Learning , 2020 725–735, arXiv:2002.09677

work page arXiv 2020
[8]

An analysis of Ermakov–Zolotukhin quadrature using kernels

A. Belhadji, “An analysis of Ermakov–Zolotukhin quadrature using kernels”, Advances in Neural Information Processing Systems 34 (2021) 27278–27289, arXiv:2309.01200

work page arXiv 2021
[9]

Sketch and shift: a robust decoder for compressive clustering

A. Belhadji and R. Gribonval, “Sketch and shift: a robust decoder for compressive clustering”, Transactions on Machine Learning Research, 2024 arXiv:2312.09940

work page arXiv 2024
[10]

Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees

F.-X. Briol, C. Oates, M. Girolami, and M. A. Osborne, “Frank-Wolfe Bayesian quadrature: Probabilistic integration with theoretical guarantees”, Advances in Neural Information Processing Systems 28 (2015) arXiv:1506.02681

work page internal anchor Pith review Pith/arXiv arXiv 2015
[11]

Gaussian mean-shift is an EM algorithm

M. A. Carreira-Perpinan, “Gaussian mean-shift is an EM algorithm”, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007), no. 5, 767–776

work page 2007
[12]

A review of mean-shift algorithms for clustering

M. A. Carreira-Perpin´ an, “A review of mean-shift algorithms for clustering”, arXiv preprint, 2015 arXiv:1503.00687

work page internal anchor Pith review Pith/arXiv arXiv 2015
[13]

A blob method for diffusion

J. A. Carrillo, K. Craig, and F. S. Patacchini, “A blob method for diffusion”, Calculus of Variations and Partial Differential Equations 58 (2019) 1–53, arXiv:1709.09195

work page internal anchor Pith review Pith/arXiv arXiv 2019
[14]

Efficient numerical integration in reproducing kernel Hilbert spaces via leverage scores sampling

A. Chatalic, N. Schreuder, E. De Vito, and L. Rosasco, “Efficient numerical integration in reproducing kernel Hilbert spaces via leverage scores sampling”, arXiv preprint, 2023 arXiv:2311.13548

work page arXiv 2023
[15]

Stein Points

W. Chen, L. Mackey, J. Gorham, F. Briol, and C. Oates, “Stein points”, in “Proceedings of the 35th International Conference on Machine Learning”, J. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, pp. 844–853. PMLR, 10–15 Jul 2018. arXiv:1803.10161. 13

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

Mean shift, mode seeking, and clustering

Y. Cheng, “Mean shift, mode seeking, and clustering”, IEEE transactions on pattern analysis and machine intelligence 17 (1995), no. 8, 790–799

work page 1995
[17]

Chewi, J

S. Chewi, J. Niles-Weed, and P. Rigollet, “Statistical optimal transport”, arXiv preprint, 2024 arXiv:2407.18163

work page arXiv 2024
[18]

SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

S. Chewi, T. Le Gouic, C. Lu, T. Maunu, and P. Rigollet, “SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence”, Advances in Neural Information Processing Systems 33 (2020) 2098–2109, arXiv:2006.02509

work page arXiv 2020
[19]

Bandwidth selection for kernel density estimation

S.-T. Chiu, “Bandwidth selection for kernel density estimation”, The Annals of Statistics , 1991 1883–1905

work page 1991
[20]

On lazy training in differentiable programming

L. Chizat, E. Oyallon, and F. Bach, “On lazy training in differentiable programming”, Advances in neural information processing systems 32 (2019) arXiv:1812.07956

work page arXiv 2019
[21]

An Interpolating Distance between Optimal Transport and Fisher-Rao

L. Chizat, G. Peyr´ e, B. Schmitzer, and F. Vialard, “An interpolating distance between optimal transport and Fisher–Rao metrics”, Foundations of Computational Mathematics 18 (2018) 1–44, arXiv:1506.06430

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Sparse optimization on measures with over-parameterized gradient descent

L. Chizat, “Sparse optimization on measures with over-parameterized gradient descent”, Mathematical Programming 194 (2022), no. 1, 487–532, arXiv:1907.10300

work page arXiv 2022
[23]

Mean shift: A robust approach toward feature space analysis

D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis”, IEEE Transactions on pattern analysis and machine intelligence 24 (2002), no. 5, 603–619

work page 2002
[24]

A Blob Method for the Aggregation Equation

K. Craig and A. Bertozzi, “A blob method for the aggregation equation”, Mathematics of computation 85 (2016), no. 300, 1681–1717, arXiv:1405.6424

work page internal anchor Pith review Pith/arXiv arXiv 2016
[25]

Exact Reconstruction using Beurling Minimal Extrapolation

Y. De Castro and F. Gamboa, “Exact reconstruction using beurling minimal extrapolation”, Journal of Mathematical Analysis and applications 395 (2012), no. 1, 336–354, arXiv:1103.4951

work page internal anchor Pith review Pith/arXiv arXiv 2012
[26]

On optimal center locations for radial basis function interpolation: computational aspects

S. De M., “On optimal center locations for radial basis function interpolation: computational aspects”, Rend. Splines Radial Basis Functions and Applications 61 (2003), no. 3, 343–358

work page 2003
[27]

Near-optimal data-independent point locations for radial basis function interpolation

S. De M., R. Schaback, and H. Wendland, “Near-optimal data-independent point locations for radial basis function interpolation”, Advances in Computational Mathematics 23 (2005) 317–330

work page 2005
[28]

Centroidal Voronoi tessellations: Applications and algorithms

Q. Du, V. Faber, and M. Gunzburger, “Centroidal Voronoi tessellations: Applications and algorithms”, SIAM review 41 (1999), no. 4, 637–676

work page 1999
[29]

Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations

Q. Du, M. Emelianenko, and L. Ju, “Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations”, SIAM journal on numerical analysis 44 (2006), no. 1, 102–119

work page 2006
[30]

Generalized kernel thinning

R. Dwivedi and L. Mackey, “Generalized kernel thinning”, in “The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022”. OpenReview.net, 2022. arXiv:2110.01593

work page arXiv 2022
[31]

Kernel thinning

R. Dwivedi and L. Mackey, “Kernel thinning”, Journal of Machine Learning Research 25 (2024), no. 152, 1–77, arXiv:2105.05842. 14

work page arXiv 2024
[32]

Training generative neural networks via Maximum Mean Discrepancy optimization

G. K. Dziugaite, D. M. Roy, and Z. Ghahramani, “Training generative neural networks via maximum mean discrepancy optimization”, arXiv preprint, 2015 arXiv:1505.03906

work page internal anchor Pith review Pith/arXiv arXiv 2015
[33]

Optimal Monte Carlo integration on closed manifolds

M. Ehler, M. Gr¨ af, and C. J. Oates, “Optimal Monte Carlo integration on closed manifolds”, Statistics and Computing 29 (2019), no. 6, 1203–1214, arXiv:1707.04723

work page internal anchor Pith review Pith/arXiv arXiv 2019
[34]

Nondegeneracy and weak global convergence of the Lloyd algorithm in Rd

M. Emelianenko, L. Ju, and A. Rand, “Nondegeneracy and weak global convergence of the Lloyd algorithm in Rd”, SIAM Journal on Numerical Analysis 46 (2008), no. 3, 1423–1441

work page 2008
[35]

Kernel quadrature with randomly pivoted cholesky.arXiv preprint arXiv:2306.03955,

E. Epperly and E. Moreno, “Kernel quadrature with randomly pivoted Cholesky”, Advances in Neural Information Processing Systems 36 (2023) 65850–65868, arXiv:2306.03955

work page arXiv 2023
[36]

The estimation of the gradient of a density function, with applications in pattern recognition

K. Fukunaga and L. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition”, IEEE Transactions on information theory 21 (1975), no. 1, 32–40

work page 1975
[37]

A JKO splitting scheme for Kantorovich-Fisher-Rao gradient flows

T. O. Gallou¨ et and L. Monsaingeon, “A JKO splitting scheme for Kantorovich–Fisher–Rao gradient flows”, SIAM Journal on Mathematical Analysis 49 (2017), no. 2, 1100–1130, arXiv:1602.04457

work page internal anchor Pith review Pith/arXiv arXiv 2017
[38]

On the Convergence of the Mean Shift Algorithm in the One-Dimensional Space

Y. A. Ghassabeh, “On the convergence of the mean shift algorithm in the one-dimensional space”, Pattern Recognition Letters 34 (2013), no. 12, 1423–1427, arXiv:1407.2961

work page internal anchor Pith review Pith/arXiv arXiv 2013
[39]

A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel

Y. A. Ghassabeh, “A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel”, Journal of Multivariate Analysis 135 (2015) 1–10

work page 2015
[40]

Interaction-force transport gradient flows

E. Gladin, P. Dvurechensky, A. Mielke, and J.-J. Zhu, “Interaction-force transport gradient flows”, in “Advances in Neural Information Processing Systems”, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, eds., vol. 37, pp. 14484–14508. Curran Associates, Inc., 2024. arXiv:2405.17075

work page arXiv 2024
[41]

KALE flow: A relaxed KL gradient flow for probabilities with disjoint support

P. Glaser, M. Arbel, and A. Gretton, “KALE flow: A relaxed KL gradient flow for probabilities with disjoint support”, Advances in Neural Information Processing Systems 34 (2021) 8018–8031, arXiv:2106.08929

work page arXiv 2021
[42]

Foundations of quantization for probability distributions

S. Graf and H. Luschgy, “Foundations of quantization for probability distributions”, Springer Science & Business Media, 2000

work page 2000
[43]

A kernel statistical test of independence

A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Sch¨ olkopf, and A. Smola, “A kernel statistical test of independence”, Advances in neural information processing systems 20 (2007)

work page 2007
[44]

A kernel two-sample test

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch¨ olkopf, and A. Smola, “A kernel two-sample test”, The Journal of Machine Learning Research 13 (2012), no. 1, 723–773

work page 2012
[45]

Positively weighted kernel quadrature via subsampling

S. Hayakawa, H. Oberhauser, and T. Lyons, “Positively weighted kernel quadrature via subsampling”, Advances in Neural Information Processing Systems 35 (2022) 6886–6900, arXiv:2107.09597

work page arXiv 2022
[46]

Sampling-based Nystr¨ om approximation and kernel quadrature

S. Hayakawa, H. Oberhauser, and T. Lyons, “Sampling-based Nystr¨ om approximation and kernel quadrature”, in “International Conference on Machine Learning”, pp. 12678–12699, PMLR. 2023. 15

work page 2023
[47]

Generative sliced MMD flows with Riesz kernels

J. Hertrich, C. Wald, F. Altekr¨ uger, and P. Hagemann, “Generative sliced MMD flows with Riesz kernels”, in “The Twelfth International Conference on Learning Representations”. 2024. arXiv:2305.11463

work page arXiv 2024
[48]

Optimally-Weighted Herding is Bayesian Quadrature

F. Husz´ ar and D. Duvenaud, “Optimally–weighted herding is Bayesian quadrature”, arXiv preprint, 2012 arXiv:1204.1664

work page internal anchor Pith review Pith/arXiv arXiv 2012
[49]

The variational formulation of the Fokker–Planck equation

R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulation of the Fokker–Planck equation”, SIAM journal on mathematical analysis 29 (1998), no. 1, 1–17

work page 1998
[50]

Fully symmetric kernel quadrature

T. Karvonen and S. S¨ arkk¨ a, “Fully symmetric kernel quadrature”,SIAM Journal on Scientific Computing 40 (2018), no. 2, A697–A720, arXiv:1703.06359

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

Gaussian kernel quadrature at scaled Gauss-Hermite nodes

T. Karvonen and S. S¨ arkk¨ a, “Gaussian kernel quadrature at scaled Gauss–Hermite nodes”, BIT Numerical Mathematics 59 (2019), no. 4, 877–902, arXiv:1803.09532

work page internal anchor Pith review Pith/arXiv arXiv 2019
[52]

Kernel-based interpolation at approximate Fekete points

T. Karvonen, S. S¨ arkk¨ a, and K. Tanaka, “Kernel-based interpolation at approximate Fekete points”, Numerical Algorithms 87 (2021) 445–468, arXiv:1912.07316

work page arXiv 2021
[53]

On the positivity and magnitudes of Bayesian quadrature weights

T. Karvonen, M. Kanagawa, and S. S¨ arkk¨ a, “On the positivity and magnitudes of Bayesian quadrature weights”, Statistics and Computing 29 (2019) 1317–1333, arXiv:1812.08509

work page arXiv 2019
[54]

Numerical methods for nonlinear equations

C. T. Kelley, “Numerical methods for nonlinear equations”, Acta Numerica 27 (2018) 207–287

work page 2018
[55]

Exponential rate of convergence for Lloyd’s method I

J. Kieffer, “Exponential rate of convergence for Lloyd’s method I”, IEEE Transactions on Information Theory 28 (1982), no. 2, 205–210

work page 1982
[56]

A new optimal transport distance on the space of finite Radon measures

S. Kondratyev, L. Monsaingeon, and D. Vorotnikov, “A new optimal transport distance on the space of finite Radon measures”, Advances in Differential Equations 21 November (2016) arXiv:1505.07746

work page internal anchor Pith review Pith/arXiv arXiv 2016
[57]

Kernel Stein discrepancy descent

A. Korba, P. Aubin-Frankowski, S. Majewski, and P. Ablin, “Kernel Stein discrepancy descent”, in “International Conference on Machine Learning”, pp. 5719–5730, PMLR. 2021. arXiv:2105.09994

work page arXiv 2021
[58]

Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

S. Lacoste-Julien, F. Lindsten, and F. Bach, “Sequential kernel herding: Frank-Wolfe optimization for particle filtering”, in “Artificial Intelligence and Statistics”, pp. 544–552, PMLR. 2015. arXiv:1501.02056

work page internal anchor Pith review Pith/arXiv arXiv 2015
[59]

Numba: A LLVM-based Python JIT compiler

S. K. Lam, A. Pitrou, and S. Seibert, “Numba: A LLVM-based Python JIT compiler”, in “Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC”, pp. 1–6. 2015

work page 2015
[60]

MNIST handwritten digit database

Y. LeCun, C. Cortes, and C. Burges, “MNIST handwritten digit database”, ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010)

work page 2010
[61]

MMD GAN: Towards Deeper Understanding of Moment Matching Network

C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. P´ oczos, “MMD GAN: Towards deeper understanding of moment matching network”, Advances in neural information processing systems 30 (2017) arXiv:1705.08584

work page internal anchor Pith review Pith/arXiv arXiv 2017
[62]

A note on the convergence of the mean shift

X. Li, Z. Hu, and F. Wu, “A note on the convergence of the mean shift”, Pattern recognition 40 (2007), no. 6, 1756–1762. 16

work page 2007
[63]

Optimal Entropy-Transport problems and a new Hellinger-Kantorovich distance between positive measures

M. Liero, A. Mielke, and G. Savar´ e, “Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures”, Inventiones mathematicae 211 (2018), no. 3, 969–1117, arXiv:1508.07941

work page internal anchor Pith review Pith/arXiv arXiv 2018
[64]

Stein variational gradient descent: A general purpose Bayesian inference algorithm

Q. Liu and D. Wang, “Stein variational gradient descent: A general purpose Bayesian inference algorithm”, Advances in neural information processing systems 29 (2016) arXiv:1608.04471

work page arXiv 2016
[65]

Birth–death dynamics for sampling: global convergence, approximations and their asymptotics

Y. Lu, D. Slepˇ cev, and L. Wang, “Birth–death dynamics for sampling: global convergence, approximations and their asymptotics”, Nonlinearity 36 (2023), no. 11, 5731, arXiv:2211.00450

work page arXiv 2023
[66]

Accelerating Langevin Sampling with Birth-death

Y. Lu, J. Lu, and J. Nolen, “Accelerating Langevin sampling with birth-death”, arXiv preprint, 2019 arXiv:1905.09863

work page internal anchor Pith review Pith/arXiv arXiv 2019
[67]

Sampling in unit time with kernel Fisher–Rao flow

A. Maurais and Y. Marzouk, “Sampling in unit time with kernel Fisher–Rao flow”, in “Proceedings of the 41st International Conference on Machine Learning”, vol. 235 of Proceedings of Machine Learning Research, pp. 35138–35162. PMLR, 21–27 Jul 2024. arXiv:2401.03892

work page arXiv 2024
[68]

Kernel mean embedding of distributions: A review and beyond

K. Muandet, K. Fukumizu, B. Sriperumbudur, B. Sch¨ olkopf,et al., “Kernel mean embedding of distributions: A review and beyond”, Foundations and Trends® in Machine Learning 10 (2017), no. 1-2, 1–141, arXiv:1605.09522

work page arXiv 2017
[69]

Slice sampling

R. M. Neal, “Slice sampling”, The Annals of Statistics 31 June (2003)

work page 2003
[70]

Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification

J. Oettershagen, “Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification”, Verlag Dr. Hut, 2017

work page 2017
[71]

The geometry of dissipative evolution equations: the porous medium equation

F. Otto, “The geometry of dissipative evolution equations: the porous medium equation”, Communications in Partial Differential Equations , 2001

work page 2001
[72]

Statistically efficient thinning of a Markov chain sampler

A. B. Owen, “Statistically efficient thinning of a Markov chain sampler”, Journal of Computational and Graphical Statistics 26 (2017), no. 3, 738–744, arXiv:1510.07727

work page internal anchor Pith review Pith/arXiv arXiv 2017
[73]

Pointwise convergence of the Lloyd algorithm in higher dimension

G. Pag` es and J. Yu, “Pointwise convergence of the Lloyd algorithm in higher dimension”, SIAM Journal on Control and Optimization 54 (2016), no. 5, 2354–2382, arXiv:1401.0192

work page internal anchor Pith review Pith/arXiv arXiv 2016
[74]

Computational optima l transport

G. Peyr´ e and M. Cuturi, “Computational optimal transport: With applications to data science”, Foundations and Trends® in Machine Learning 11 (2019), no. 5-6, 355–607, arXiv:1803.00567

work page arXiv 2019
[75]

n-Widths in Approximation Theory

A. Pinkus, “n-Widths in Approximation Theory”, Springer Science & Business Media, 2012

work page 2012
[76]

On the sequential convergence of Lloyd's algorithms

L. Portales, E. Cazelles, and E. Pauwels, “On the sequential convergence of Lloyd’s algorithms”, arXiv preprint, 2024 arXiv:2405.20744

work page internal anchor Pith review Pith/arXiv arXiv 2024
[77]

Interactive supercomputing on 40,000 cores for machine learning and data analysis

A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Bergeron, V. Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin, J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas, “Interactive supercomputing on 40,000 cores for machine learning and data analysis”, in “2018 IEEE High Performance extreme Computing Conference (HPEC)”, p...

work page 2018
[78]

Optimal thinning of MCMC output

M. Riabiz, W. Y. Chen, J. Cockayne, P. Swietach, S. A. Niederer, L. Mackey, and C. J. Oates, “Optimal thinning of MCMC output”, Journal of the Royal Statistical Society Series B: Statistical Methodology 84 (2022), no. 4, 1059–1081, arXiv:2005.03952. 17

work page arXiv 2022
[79]

Monte Carlo statistical methods

C. P. Robert, G. Casella, and G. Casella, “Monte Carlo statistical methods”, Springer, 1999

work page 1999
[80]

Global convergence of neuron birth-death dynamics

G. Rotskoff, S. Jelassi, J. Bruna, and E. Vanden-Eijnden, “Global convergence of neuron birth-death dynamics”, in “International Conference on Machine Learning”. 2019. arXiv:1902.01843

work page internal anchor Pith review Pith/arXiv arXiv 2019

Showing first 80 references.

[1] [1]

Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels

F. Altekr¨ uger, J. Hertrich, and G. Steidl, “Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels”, ICML, 2023 arXiv:2301.11624

work page arXiv 2023

[2] [2]

Gradient flows: in metric spaces and in the space of probability measures

L. Ambrosio, N. Gigli, and G. Savar´ e, “Gradient flows: in metric spaces and in the space of probability measures”, Springer Science & Business Media, 2008

work page 2008

[3] [3]

Maximum mean discrepancy gradient flow

M. Arbel, A. Korba, A. Salim, and A. Gretton, “Maximum mean discrepancy gradient flow”, Advances in Neural Information Processing Systems 32 (2019) arXiv:1906.04370

work page arXiv 2019

[4] [4]

On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm

E. Arias-Castro, D. Mason, and B. Pelletier, “On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm”, Journal of Machine Learning Research 17 (2016), no. 206, 1–4

work page 2016

[5] [5]

On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions

F. Bach, “On the equivalence between kernel quadrature rules and random feature expansions”, The Journal of Machine Learning Research 18 (2017), no. 1, 714–751, arXiv:1502.06800

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Kernel quadrature with DPPs

A. Belhadji, R. Bardenet, and P. Chainais, “Kernel quadrature with DPPs”, Advances in Neural Information Processing Systems 32 (2019) 12907–12917, arXiv:1906.07832

work page arXiv 2019

[7] [7]

Kernel interpolation with continuous volume sampling

A. Belhadji, R. Bardenet, and P. Chainais, “Kernel interpolation with continuous volume sampling”, Proceedings of the 37th International Conference on Machine Learning , 2020 725–735, arXiv:2002.09677

work page arXiv 2020

[8] [8]

An analysis of Ermakov–Zolotukhin quadrature using kernels

A. Belhadji, “An analysis of Ermakov–Zolotukhin quadrature using kernels”, Advances in Neural Information Processing Systems 34 (2021) 27278–27289, arXiv:2309.01200

work page arXiv 2021

[9] [9]

Sketch and shift: a robust decoder for compressive clustering

A. Belhadji and R. Gribonval, “Sketch and shift: a robust decoder for compressive clustering”, Transactions on Machine Learning Research, 2024 arXiv:2312.09940

work page arXiv 2024

[10] [10]

Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees

F.-X. Briol, C. Oates, M. Girolami, and M. A. Osborne, “Frank-Wolfe Bayesian quadrature: Probabilistic integration with theoretical guarantees”, Advances in Neural Information Processing Systems 28 (2015) arXiv:1506.02681

work page internal anchor Pith review Pith/arXiv arXiv 2015

[11] [11]

Gaussian mean-shift is an EM algorithm

M. A. Carreira-Perpinan, “Gaussian mean-shift is an EM algorithm”, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007), no. 5, 767–776

work page 2007

[12] [12]

A review of mean-shift algorithms for clustering

M. A. Carreira-Perpin´ an, “A review of mean-shift algorithms for clustering”, arXiv preprint, 2015 arXiv:1503.00687

work page internal anchor Pith review Pith/arXiv arXiv 2015

[13] [13]

A blob method for diffusion

J. A. Carrillo, K. Craig, and F. S. Patacchini, “A blob method for diffusion”, Calculus of Variations and Partial Differential Equations 58 (2019) 1–53, arXiv:1709.09195

work page internal anchor Pith review Pith/arXiv arXiv 2019

[14] [14]

Efficient numerical integration in reproducing kernel Hilbert spaces via leverage scores sampling

A. Chatalic, N. Schreuder, E. De Vito, and L. Rosasco, “Efficient numerical integration in reproducing kernel Hilbert spaces via leverage scores sampling”, arXiv preprint, 2023 arXiv:2311.13548

work page arXiv 2023

[15] [15]

Stein Points

W. Chen, L. Mackey, J. Gorham, F. Briol, and C. Oates, “Stein points”, in “Proceedings of the 35th International Conference on Machine Learning”, J. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, pp. 844–853. PMLR, 10–15 Jul 2018. arXiv:1803.10161. 13

work page internal anchor Pith review Pith/arXiv arXiv 2018

[16] [16]

Mean shift, mode seeking, and clustering

Y. Cheng, “Mean shift, mode seeking, and clustering”, IEEE transactions on pattern analysis and machine intelligence 17 (1995), no. 8, 790–799

work page 1995

[17] [17]

Chewi, J

S. Chewi, J. Niles-Weed, and P. Rigollet, “Statistical optimal transport”, arXiv preprint, 2024 arXiv:2407.18163

work page arXiv 2024

[18] [18]

SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

S. Chewi, T. Le Gouic, C. Lu, T. Maunu, and P. Rigollet, “SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence”, Advances in Neural Information Processing Systems 33 (2020) 2098–2109, arXiv:2006.02509

work page arXiv 2020

[19] [19]

Bandwidth selection for kernel density estimation

S.-T. Chiu, “Bandwidth selection for kernel density estimation”, The Annals of Statistics , 1991 1883–1905

work page 1991

[20] [20]

On lazy training in differentiable programming

L. Chizat, E. Oyallon, and F. Bach, “On lazy training in differentiable programming”, Advances in neural information processing systems 32 (2019) arXiv:1812.07956

work page arXiv 2019

[21] [21]

An Interpolating Distance between Optimal Transport and Fisher-Rao

L. Chizat, G. Peyr´ e, B. Schmitzer, and F. Vialard, “An interpolating distance between optimal transport and Fisher–Rao metrics”, Foundations of Computational Mathematics 18 (2018) 1–44, arXiv:1506.06430

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Sparse optimization on measures with over-parameterized gradient descent

L. Chizat, “Sparse optimization on measures with over-parameterized gradient descent”, Mathematical Programming 194 (2022), no. 1, 487–532, arXiv:1907.10300

work page arXiv 2022

[23] [23]

Mean shift: A robust approach toward feature space analysis

D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis”, IEEE Transactions on pattern analysis and machine intelligence 24 (2002), no. 5, 603–619

work page 2002

[24] [24]

A Blob Method for the Aggregation Equation

K. Craig and A. Bertozzi, “A blob method for the aggregation equation”, Mathematics of computation 85 (2016), no. 300, 1681–1717, arXiv:1405.6424

work page internal anchor Pith review Pith/arXiv arXiv 2016

[25] [25]

Exact Reconstruction using Beurling Minimal Extrapolation

Y. De Castro and F. Gamboa, “Exact reconstruction using beurling minimal extrapolation”, Journal of Mathematical Analysis and applications 395 (2012), no. 1, 336–354, arXiv:1103.4951

work page internal anchor Pith review Pith/arXiv arXiv 2012

[26] [26]

On optimal center locations for radial basis function interpolation: computational aspects

S. De M., “On optimal center locations for radial basis function interpolation: computational aspects”, Rend. Splines Radial Basis Functions and Applications 61 (2003), no. 3, 343–358

work page 2003

[27] [27]

Near-optimal data-independent point locations for radial basis function interpolation

S. De M., R. Schaback, and H. Wendland, “Near-optimal data-independent point locations for radial basis function interpolation”, Advances in Computational Mathematics 23 (2005) 317–330

work page 2005

[28] [28]

Centroidal Voronoi tessellations: Applications and algorithms

Q. Du, V. Faber, and M. Gunzburger, “Centroidal Voronoi tessellations: Applications and algorithms”, SIAM review 41 (1999), no. 4, 637–676

work page 1999

[29] [29]

Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations

Q. Du, M. Emelianenko, and L. Ju, “Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations”, SIAM journal on numerical analysis 44 (2006), no. 1, 102–119

work page 2006

[30] [30]

Generalized kernel thinning

R. Dwivedi and L. Mackey, “Generalized kernel thinning”, in “The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022”. OpenReview.net, 2022. arXiv:2110.01593

work page arXiv 2022

[31] [31]

Kernel thinning

R. Dwivedi and L. Mackey, “Kernel thinning”, Journal of Machine Learning Research 25 (2024), no. 152, 1–77, arXiv:2105.05842. 14

work page arXiv 2024

[32] [32]

Training generative neural networks via Maximum Mean Discrepancy optimization

G. K. Dziugaite, D. M. Roy, and Z. Ghahramani, “Training generative neural networks via maximum mean discrepancy optimization”, arXiv preprint, 2015 arXiv:1505.03906

work page internal anchor Pith review Pith/arXiv arXiv 2015

[33] [33]

Optimal Monte Carlo integration on closed manifolds

M. Ehler, M. Gr¨ af, and C. J. Oates, “Optimal Monte Carlo integration on closed manifolds”, Statistics and Computing 29 (2019), no. 6, 1203–1214, arXiv:1707.04723

work page internal anchor Pith review Pith/arXiv arXiv 2019

[34] [34]

Nondegeneracy and weak global convergence of the Lloyd algorithm in Rd

M. Emelianenko, L. Ju, and A. Rand, “Nondegeneracy and weak global convergence of the Lloyd algorithm in Rd”, SIAM Journal on Numerical Analysis 46 (2008), no. 3, 1423–1441

work page 2008

[35] [35]

Kernel quadrature with randomly pivoted cholesky.arXiv preprint arXiv:2306.03955,

E. Epperly and E. Moreno, “Kernel quadrature with randomly pivoted Cholesky”, Advances in Neural Information Processing Systems 36 (2023) 65850–65868, arXiv:2306.03955

work page arXiv 2023

[36] [36]

The estimation of the gradient of a density function, with applications in pattern recognition

K. Fukunaga and L. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition”, IEEE Transactions on information theory 21 (1975), no. 1, 32–40

work page 1975

[37] [37]

A JKO splitting scheme for Kantorovich-Fisher-Rao gradient flows

T. O. Gallou¨ et and L. Monsaingeon, “A JKO splitting scheme for Kantorovich–Fisher–Rao gradient flows”, SIAM Journal on Mathematical Analysis 49 (2017), no. 2, 1100–1130, arXiv:1602.04457

work page internal anchor Pith review Pith/arXiv arXiv 2017

[38] [38]

On the Convergence of the Mean Shift Algorithm in the One-Dimensional Space

Y. A. Ghassabeh, “On the convergence of the mean shift algorithm in the one-dimensional space”, Pattern Recognition Letters 34 (2013), no. 12, 1423–1427, arXiv:1407.2961

work page internal anchor Pith review Pith/arXiv arXiv 2013

[39] [39]

A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel

Y. A. Ghassabeh, “A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel”, Journal of Multivariate Analysis 135 (2015) 1–10

work page 2015

[40] [40]

Interaction-force transport gradient flows

E. Gladin, P. Dvurechensky, A. Mielke, and J.-J. Zhu, “Interaction-force transport gradient flows”, in “Advances in Neural Information Processing Systems”, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, eds., vol. 37, pp. 14484–14508. Curran Associates, Inc., 2024. arXiv:2405.17075

work page arXiv 2024

[41] [41]

KALE flow: A relaxed KL gradient flow for probabilities with disjoint support

P. Glaser, M. Arbel, and A. Gretton, “KALE flow: A relaxed KL gradient flow for probabilities with disjoint support”, Advances in Neural Information Processing Systems 34 (2021) 8018–8031, arXiv:2106.08929

work page arXiv 2021

[42] [42]

Foundations of quantization for probability distributions

S. Graf and H. Luschgy, “Foundations of quantization for probability distributions”, Springer Science & Business Media, 2000

work page 2000

[43] [43]

A kernel statistical test of independence

A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Sch¨ olkopf, and A. Smola, “A kernel statistical test of independence”, Advances in neural information processing systems 20 (2007)

work page 2007

[44] [44]

A kernel two-sample test

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch¨ olkopf, and A. Smola, “A kernel two-sample test”, The Journal of Machine Learning Research 13 (2012), no. 1, 723–773

work page 2012

[45] [45]

Positively weighted kernel quadrature via subsampling

S. Hayakawa, H. Oberhauser, and T. Lyons, “Positively weighted kernel quadrature via subsampling”, Advances in Neural Information Processing Systems 35 (2022) 6886–6900, arXiv:2107.09597

work page arXiv 2022

[46] [46]

Sampling-based Nystr¨ om approximation and kernel quadrature

S. Hayakawa, H. Oberhauser, and T. Lyons, “Sampling-based Nystr¨ om approximation and kernel quadrature”, in “International Conference on Machine Learning”, pp. 12678–12699, PMLR. 2023. 15

work page 2023

[47] [47]

Generative sliced MMD flows with Riesz kernels

J. Hertrich, C. Wald, F. Altekr¨ uger, and P. Hagemann, “Generative sliced MMD flows with Riesz kernels”, in “The Twelfth International Conference on Learning Representations”. 2024. arXiv:2305.11463

work page arXiv 2024

[48] [48]

Optimally-Weighted Herding is Bayesian Quadrature

F. Husz´ ar and D. Duvenaud, “Optimally–weighted herding is Bayesian quadrature”, arXiv preprint, 2012 arXiv:1204.1664

work page internal anchor Pith review Pith/arXiv arXiv 2012

[49] [49]

The variational formulation of the Fokker–Planck equation

R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulation of the Fokker–Planck equation”, SIAM journal on mathematical analysis 29 (1998), no. 1, 1–17

work page 1998

[50] [50]

Fully symmetric kernel quadrature

T. Karvonen and S. S¨ arkk¨ a, “Fully symmetric kernel quadrature”,SIAM Journal on Scientific Computing 40 (2018), no. 2, A697–A720, arXiv:1703.06359

work page internal anchor Pith review Pith/arXiv arXiv 2018

[51] [51]

Gaussian kernel quadrature at scaled Gauss-Hermite nodes

T. Karvonen and S. S¨ arkk¨ a, “Gaussian kernel quadrature at scaled Gauss–Hermite nodes”, BIT Numerical Mathematics 59 (2019), no. 4, 877–902, arXiv:1803.09532

work page internal anchor Pith review Pith/arXiv arXiv 2019

[52] [52]

Kernel-based interpolation at approximate Fekete points

T. Karvonen, S. S¨ arkk¨ a, and K. Tanaka, “Kernel-based interpolation at approximate Fekete points”, Numerical Algorithms 87 (2021) 445–468, arXiv:1912.07316

work page arXiv 2021

[53] [53]

On the positivity and magnitudes of Bayesian quadrature weights

T. Karvonen, M. Kanagawa, and S. S¨ arkk¨ a, “On the positivity and magnitudes of Bayesian quadrature weights”, Statistics and Computing 29 (2019) 1317–1333, arXiv:1812.08509

work page arXiv 2019

[54] [54]

Numerical methods for nonlinear equations

C. T. Kelley, “Numerical methods for nonlinear equations”, Acta Numerica 27 (2018) 207–287

work page 2018

[55] [55]

Exponential rate of convergence for Lloyd’s method I

J. Kieffer, “Exponential rate of convergence for Lloyd’s method I”, IEEE Transactions on Information Theory 28 (1982), no. 2, 205–210

work page 1982

[56] [56]

A new optimal transport distance on the space of finite Radon measures

S. Kondratyev, L. Monsaingeon, and D. Vorotnikov, “A new optimal transport distance on the space of finite Radon measures”, Advances in Differential Equations 21 November (2016) arXiv:1505.07746

work page internal anchor Pith review Pith/arXiv arXiv 2016

[57] [57]

Kernel Stein discrepancy descent

A. Korba, P. Aubin-Frankowski, S. Majewski, and P. Ablin, “Kernel Stein discrepancy descent”, in “International Conference on Machine Learning”, pp. 5719–5730, PMLR. 2021. arXiv:2105.09994

work page arXiv 2021

[58] [58]

Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

S. Lacoste-Julien, F. Lindsten, and F. Bach, “Sequential kernel herding: Frank-Wolfe optimization for particle filtering”, in “Artificial Intelligence and Statistics”, pp. 544–552, PMLR. 2015. arXiv:1501.02056

work page internal anchor Pith review Pith/arXiv arXiv 2015

[59] [59]

Numba: A LLVM-based Python JIT compiler

S. K. Lam, A. Pitrou, and S. Seibert, “Numba: A LLVM-based Python JIT compiler”, in “Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC”, pp. 1–6. 2015

work page 2015

[60] [60]

MNIST handwritten digit database

Y. LeCun, C. Cortes, and C. Burges, “MNIST handwritten digit database”, ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010)

work page 2010

[61] [61]

MMD GAN: Towards Deeper Understanding of Moment Matching Network

C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. P´ oczos, “MMD GAN: Towards deeper understanding of moment matching network”, Advances in neural information processing systems 30 (2017) arXiv:1705.08584

work page internal anchor Pith review Pith/arXiv arXiv 2017

[62] [62]

A note on the convergence of the mean shift

X. Li, Z. Hu, and F. Wu, “A note on the convergence of the mean shift”, Pattern recognition 40 (2007), no. 6, 1756–1762. 16

work page 2007

[63] [63]

Optimal Entropy-Transport problems and a new Hellinger-Kantorovich distance between positive measures

M. Liero, A. Mielke, and G. Savar´ e, “Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures”, Inventiones mathematicae 211 (2018), no. 3, 969–1117, arXiv:1508.07941

work page internal anchor Pith review Pith/arXiv arXiv 2018

[64] [64]

Stein variational gradient descent: A general purpose Bayesian inference algorithm

Q. Liu and D. Wang, “Stein variational gradient descent: A general purpose Bayesian inference algorithm”, Advances in neural information processing systems 29 (2016) arXiv:1608.04471

work page arXiv 2016

[65] [65]

Birth–death dynamics for sampling: global convergence, approximations and their asymptotics

Y. Lu, D. Slepˇ cev, and L. Wang, “Birth–death dynamics for sampling: global convergence, approximations and their asymptotics”, Nonlinearity 36 (2023), no. 11, 5731, arXiv:2211.00450

work page arXiv 2023

[66] [66]

Accelerating Langevin Sampling with Birth-death

Y. Lu, J. Lu, and J. Nolen, “Accelerating Langevin sampling with birth-death”, arXiv preprint, 2019 arXiv:1905.09863

work page internal anchor Pith review Pith/arXiv arXiv 2019

[67] [67]

Sampling in unit time with kernel Fisher–Rao flow

A. Maurais and Y. Marzouk, “Sampling in unit time with kernel Fisher–Rao flow”, in “Proceedings of the 41st International Conference on Machine Learning”, vol. 235 of Proceedings of Machine Learning Research, pp. 35138–35162. PMLR, 21–27 Jul 2024. arXiv:2401.03892

work page arXiv 2024

[68] [68]

Kernel mean embedding of distributions: A review and beyond

K. Muandet, K. Fukumizu, B. Sriperumbudur, B. Sch¨ olkopf,et al., “Kernel mean embedding of distributions: A review and beyond”, Foundations and Trends® in Machine Learning 10 (2017), no. 1-2, 1–141, arXiv:1605.09522

work page arXiv 2017

[69] [69]

Slice sampling

R. M. Neal, “Slice sampling”, The Annals of Statistics 31 June (2003)

work page 2003

[70] [70]

Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification

J. Oettershagen, “Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification”, Verlag Dr. Hut, 2017

work page 2017

[71] [71]

The geometry of dissipative evolution equations: the porous medium equation

F. Otto, “The geometry of dissipative evolution equations: the porous medium equation”, Communications in Partial Differential Equations , 2001

work page 2001

[72] [72]

Statistically efficient thinning of a Markov chain sampler

A. B. Owen, “Statistically efficient thinning of a Markov chain sampler”, Journal of Computational and Graphical Statistics 26 (2017), no. 3, 738–744, arXiv:1510.07727

work page internal anchor Pith review Pith/arXiv arXiv 2017

[73] [73]

Pointwise convergence of the Lloyd algorithm in higher dimension

G. Pag` es and J. Yu, “Pointwise convergence of the Lloyd algorithm in higher dimension”, SIAM Journal on Control and Optimization 54 (2016), no. 5, 2354–2382, arXiv:1401.0192

work page internal anchor Pith review Pith/arXiv arXiv 2016

[74] [74]

Computational optima l transport

G. Peyr´ e and M. Cuturi, “Computational optimal transport: With applications to data science”, Foundations and Trends® in Machine Learning 11 (2019), no. 5-6, 355–607, arXiv:1803.00567

work page arXiv 2019

[75] [75]

n-Widths in Approximation Theory

A. Pinkus, “n-Widths in Approximation Theory”, Springer Science & Business Media, 2012

work page 2012

[76] [76]

On the sequential convergence of Lloyd's algorithms

L. Portales, E. Cazelles, and E. Pauwels, “On the sequential convergence of Lloyd’s algorithms”, arXiv preprint, 2024 arXiv:2405.20744

work page internal anchor Pith review Pith/arXiv arXiv 2024

[77] [77]

Interactive supercomputing on 40,000 cores for machine learning and data analysis

A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Bergeron, V. Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin, J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas, “Interactive supercomputing on 40,000 cores for machine learning and data analysis”, in “2018 IEEE High Performance extreme Computing Conference (HPEC)”, p...

work page 2018

[78] [78]

Optimal thinning of MCMC output

M. Riabiz, W. Y. Chen, J. Cockayne, P. Swietach, S. A. Niederer, L. Mackey, and C. J. Oates, “Optimal thinning of MCMC output”, Journal of the Royal Statistical Society Series B: Statistical Methodology 84 (2022), no. 4, 1059–1081, arXiv:2005.03952. 17

work page arXiv 2022

[79] [79]

Monte Carlo statistical methods

C. P. Robert, G. Casella, and G. Casella, “Monte Carlo statistical methods”, Springer, 1999

work page 1999

[80] [80]

Global convergence of neuron birth-death dynamics

G. Rotskoff, S. Jelassi, J. Bruna, and E. Vanden-Eijnden, “Global convergence of neuron birth-death dynamics”, in “International Conference on Machine Learning”. 2019. arXiv:1902.01843

work page internal anchor Pith review Pith/arXiv arXiv 2019