Weighted quantization using MMD: From mean field to mean shift via gradient flows
Pith reviewed 2026-05-23 02:45 UTC · model grok-4.3
The pith
A Wasserstein-Fisher-Rao gradient flow on measures, discretized by interacting particles, produces the MSIP algorithm for MMD-optimal weighted quantization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Wasserstein-Fisher-Rao gradient flow minimizes MMD between a target probability measure and a weighted atomic measure; its particle discretization yields ordinary differential equations whose equilibria satisfy an extended mean-shift fixed-point equation that simultaneously generalizes mode-finding in kernel density estimation and relaxes Lloyd iteration for clustering.
What carries the argument
The Wasserstein-Fisher-Rao gradient flow discretized into a system of interacting-particle ODEs whose fixed-point iteration is the mean shift interacting particles (MSIP) algorithm.
If this is right
- MSIP recovers the classical mean shift update when all particle weights are forced equal.
- MSIP can be rewritten as preconditioned gradient descent on the MMD objective.
- MSIP functions as a relaxation of Lloyd's algorithm when applied to clustering tasks.
- The particle discretization inherits the robustness properties observed for the underlying gradient flow in high-dimensional and multimodal regimes.
Where Pith is reading between the lines
- The same particle ODE discretization could be applied to other discrepancy measures whose gradient flows admit similar mean-field descriptions.
- Variable particle weights arising from the flow may improve mode recovery in kernel density estimation compared with uniform-weight mean shift.
- Because MSIP is a relaxation of Lloyd iteration, its convergence rate on finite mixtures may be governed by the same contraction arguments used for k-means.
Load-bearing premise
That the Wasserstein-Fisher-Rao gradient flow on measures can be discretized into stable particle ODEs and a convergent fixed-point iteration without further conditions on the kernel or the target distribution.
What would settle it
A concrete counter-example in which the MSIP fixed-point iteration either diverges or converges to a weighted particle set whose MMD distance to the target exceeds that achieved by standard mean-shift or Lloyd methods on the same data.
Figures
read the original abstract
Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures that best approximates the target distribution. While much existing work relies on the Wasserstein distance to quantify approximation errors, maximum mean discrepancy (MMD) has received comparatively less attention, especially when allowing for variable particle weights. We argue that a Wasserstein-Fisher-Rao gradient flow is well-suited for designing quantizations optimal under MMD. We show that a system of interacting particles satisfying a set of ODEs discretizes this flow. We further derive a new fixed-point algorithm called mean shift interacting particles (MSIP). We show that MSIP extends the classical mean shift algorithm, widely used for identifying modes in kernel density estimators. Moreover, we show that MSIP can be interpreted as preconditioned gradient descent and that it acts as a relaxation of Lloyd's algorithm for clustering. Our unification of gradient flows, mean shift, and MMD-optimal quantization yields algorithms that are more robust than state-of-the-art methods, as demonstrated via high-dimensional and multi-modal numerical experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that a Wasserstein-Fisher-Rao gradient flow is well-suited for MMD-optimal weighted quantization of a target distribution. It shows that a system of interacting particles obeying a set of ODEs discretizes this flow, derives the mean shift interacting particles (MSIP) fixed-point algorithm from it, and claims that MSIP extends the classical mean shift algorithm, can be viewed as preconditioned gradient descent, and acts as a relaxation of Lloyd's algorithm. High-dimensional and multi-modal experiments are presented to demonstrate greater robustness than existing methods.
Significance. If the derivations are correct, the work supplies a principled gradient-flow route from MMD quantization to a practical fixed-point iteration that recovers and extends mean shift, offering a new algorithmic unification with potential advantages for clustering and particle-based approximation.
minor comments (2)
- [Abstract] Abstract: the statement that the ODE system 'discretizes this flow' and that MSIP 'extends' mean shift would benefit from a one-sentence pointer to the precise discretization scheme and the sense in which the extension holds (e.g., recovery of the classical update when weights are uniform).
- [Abstract] The manuscript should clarify whether any restrictions on the kernel or target measure are required for the WFR flow to remain well-defined and for the MSIP iteration to preserve the claimed optimality properties.
Simulated Author's Rebuttal
We thank the referee for their positive summary of the manuscript, recognition of its significance, and recommendation of minor revision. No specific major comments were listed in the report.
Circularity Check
No significant circularity
full rationale
The paper's central chain—from Wasserstein-Fisher-Rao gradient flow on measures, through interacting-particle ODE discretization, to the MSIP fixed-point iteration—is presented as a forward derivation that produces new algorithms (extensions of mean shift, relaxation of Lloyd). No equation or claim reduces a claimed prediction or result to a quantity defined by the same fitted parameters or by self-citation; the abstract and provided text contain no self-referential definitions, fitted-input renamings, or load-bearing uniqueness theorems imported from the authors' prior work. The derivation is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MSIP extends the classical mean shift algorithm... acts as a relaxation of Lloyd’s algorithm
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Stationary MMD Points
Stationary MMD points show super-convergence in integration error over MMD for RKHS integrands, and MMD gradient flows compute them with a new non-asymptotic finite-particle error bound.
-
A note on the unique properties of the Kullback--Leibler divergence for sampling via gradient flows
The Kullback-Leibler divergence is the only Bregman divergence whose gradient flow with respect to many popular metrics does not require the normalizing constant of the target distribution π.
Reference graph
Works this paper leans on
-
[1]
Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels
F. Altekr¨ uger, J. Hertrich, and G. Steidl, “Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels”, ICML, 2023 arXiv:2301.11624
-
[2]
Gradient flows: in metric spaces and in the space of probability measures
L. Ambrosio, N. Gigli, and G. Savar´ e, “Gradient flows: in metric spaces and in the space of probability measures”, Springer Science & Business Media, 2008
work page 2008
-
[3]
Maximum mean discrepancy gradient flow
M. Arbel, A. Korba, A. Salim, and A. Gretton, “Maximum mean discrepancy gradient flow”, Advances in Neural Information Processing Systems 32 (2019) arXiv:1906.04370
-
[4]
On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm
E. Arias-Castro, D. Mason, and B. Pelletier, “On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm”, Journal of Machine Learning Research 17 (2016), no. 206, 1–4
work page 2016
-
[5]
On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions
F. Bach, “On the equivalence between kernel quadrature rules and random feature expansions”, The Journal of Machine Learning Research 18 (2017), no. 1, 714–751, arXiv:1502.06800
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[6]
A. Belhadji, R. Bardenet, and P. Chainais, “Kernel quadrature with DPPs”, Advances in Neural Information Processing Systems 32 (2019) 12907–12917, arXiv:1906.07832
-
[7]
Kernel interpolation with continuous volume sampling
A. Belhadji, R. Bardenet, and P. Chainais, “Kernel interpolation with continuous volume sampling”, Proceedings of the 37th International Conference on Machine Learning , 2020 725–735, arXiv:2002.09677
-
[8]
An analysis of Ermakov–Zolotukhin quadrature using kernels
A. Belhadji, “An analysis of Ermakov–Zolotukhin quadrature using kernels”, Advances in Neural Information Processing Systems 34 (2021) 27278–27289, arXiv:2309.01200
-
[9]
Sketch and shift: a robust decoder for compressive clustering
A. Belhadji and R. Gribonval, “Sketch and shift: a robust decoder for compressive clustering”, Transactions on Machine Learning Research, 2024 arXiv:2312.09940
-
[10]
Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees
F.-X. Briol, C. Oates, M. Girolami, and M. A. Osborne, “Frank-Wolfe Bayesian quadrature: Probabilistic integration with theoretical guarantees”, Advances in Neural Information Processing Systems 28 (2015) arXiv:1506.02681
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[11]
Gaussian mean-shift is an EM algorithm
M. A. Carreira-Perpinan, “Gaussian mean-shift is an EM algorithm”, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007), no. 5, 767–776
work page 2007
-
[12]
A review of mean-shift algorithms for clustering
M. A. Carreira-Perpin´ an, “A review of mean-shift algorithms for clustering”, arXiv preprint, 2015 arXiv:1503.00687
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[13]
J. A. Carrillo, K. Craig, and F. S. Patacchini, “A blob method for diffusion”, Calculus of Variations and Partial Differential Equations 58 (2019) 1–53, arXiv:1709.09195
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[14]
Efficient numerical integration in reproducing kernel Hilbert spaces via leverage scores sampling
A. Chatalic, N. Schreuder, E. De Vito, and L. Rosasco, “Efficient numerical integration in reproducing kernel Hilbert spaces via leverage scores sampling”, arXiv preprint, 2023 arXiv:2311.13548
-
[15]
W. Chen, L. Mackey, J. Gorham, F. Briol, and C. Oates, “Stein points”, in “Proceedings of the 35th International Conference on Machine Learning”, J. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, pp. 844–853. PMLR, 10–15 Jul 2018. arXiv:1803.10161. 13
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[16]
Mean shift, mode seeking, and clustering
Y. Cheng, “Mean shift, mode seeking, and clustering”, IEEE transactions on pattern analysis and machine intelligence 17 (1995), no. 8, 790–799
work page 1995
- [17]
-
[18]
SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence
S. Chewi, T. Le Gouic, C. Lu, T. Maunu, and P. Rigollet, “SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence”, Advances in Neural Information Processing Systems 33 (2020) 2098–2109, arXiv:2006.02509
-
[19]
Bandwidth selection for kernel density estimation
S.-T. Chiu, “Bandwidth selection for kernel density estimation”, The Annals of Statistics , 1991 1883–1905
work page 1991
-
[20]
On lazy training in differentiable programming
L. Chizat, E. Oyallon, and F. Bach, “On lazy training in differentiable programming”, Advances in neural information processing systems 32 (2019) arXiv:1812.07956
-
[21]
An Interpolating Distance between Optimal Transport and Fisher-Rao
L. Chizat, G. Peyr´ e, B. Schmitzer, and F. Vialard, “An interpolating distance between optimal transport and Fisher–Rao metrics”, Foundations of Computational Mathematics 18 (2018) 1–44, arXiv:1506.06430
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[22]
Sparse optimization on measures with over-parameterized gradient descent
L. Chizat, “Sparse optimization on measures with over-parameterized gradient descent”, Mathematical Programming 194 (2022), no. 1, 487–532, arXiv:1907.10300
-
[23]
Mean shift: A robust approach toward feature space analysis
D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis”, IEEE Transactions on pattern analysis and machine intelligence 24 (2002), no. 5, 603–619
work page 2002
-
[24]
A Blob Method for the Aggregation Equation
K. Craig and A. Bertozzi, “A blob method for the aggregation equation”, Mathematics of computation 85 (2016), no. 300, 1681–1717, arXiv:1405.6424
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[25]
Exact Reconstruction using Beurling Minimal Extrapolation
Y. De Castro and F. Gamboa, “Exact reconstruction using beurling minimal extrapolation”, Journal of Mathematical Analysis and applications 395 (2012), no. 1, 336–354, arXiv:1103.4951
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[26]
On optimal center locations for radial basis function interpolation: computational aspects
S. De M., “On optimal center locations for radial basis function interpolation: computational aspects”, Rend. Splines Radial Basis Functions and Applications 61 (2003), no. 3, 343–358
work page 2003
-
[27]
Near-optimal data-independent point locations for radial basis function interpolation
S. De M., R. Schaback, and H. Wendland, “Near-optimal data-independent point locations for radial basis function interpolation”, Advances in Computational Mathematics 23 (2005) 317–330
work page 2005
-
[28]
Centroidal Voronoi tessellations: Applications and algorithms
Q. Du, V. Faber, and M. Gunzburger, “Centroidal Voronoi tessellations: Applications and algorithms”, SIAM review 41 (1999), no. 4, 637–676
work page 1999
-
[29]
Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations
Q. Du, M. Emelianenko, and L. Ju, “Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations”, SIAM journal on numerical analysis 44 (2006), no. 1, 102–119
work page 2006
-
[30]
R. Dwivedi and L. Mackey, “Generalized kernel thinning”, in “The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022”. OpenReview.net, 2022. arXiv:2110.01593
-
[31]
R. Dwivedi and L. Mackey, “Kernel thinning”, Journal of Machine Learning Research 25 (2024), no. 152, 1–77, arXiv:2105.05842. 14
-
[32]
Training generative neural networks via Maximum Mean Discrepancy optimization
G. K. Dziugaite, D. M. Roy, and Z. Ghahramani, “Training generative neural networks via maximum mean discrepancy optimization”, arXiv preprint, 2015 arXiv:1505.03906
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[33]
Optimal Monte Carlo integration on closed manifolds
M. Ehler, M. Gr¨ af, and C. J. Oates, “Optimal Monte Carlo integration on closed manifolds”, Statistics and Computing 29 (2019), no. 6, 1203–1214, arXiv:1707.04723
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[34]
Nondegeneracy and weak global convergence of the Lloyd algorithm in Rd
M. Emelianenko, L. Ju, and A. Rand, “Nondegeneracy and weak global convergence of the Lloyd algorithm in Rd”, SIAM Journal on Numerical Analysis 46 (2008), no. 3, 1423–1441
work page 2008
-
[35]
Kernel quadrature with randomly pivoted cholesky.arXiv preprint arXiv:2306.03955,
E. Epperly and E. Moreno, “Kernel quadrature with randomly pivoted Cholesky”, Advances in Neural Information Processing Systems 36 (2023) 65850–65868, arXiv:2306.03955
-
[36]
The estimation of the gradient of a density function, with applications in pattern recognition
K. Fukunaga and L. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition”, IEEE Transactions on information theory 21 (1975), no. 1, 32–40
work page 1975
-
[37]
A JKO splitting scheme for Kantorovich-Fisher-Rao gradient flows
T. O. Gallou¨ et and L. Monsaingeon, “A JKO splitting scheme for Kantorovich–Fisher–Rao gradient flows”, SIAM Journal on Mathematical Analysis 49 (2017), no. 2, 1100–1130, arXiv:1602.04457
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[38]
On the Convergence of the Mean Shift Algorithm in the One-Dimensional Space
Y. A. Ghassabeh, “On the convergence of the mean shift algorithm in the one-dimensional space”, Pattern Recognition Letters 34 (2013), no. 12, 1423–1427, arXiv:1407.2961
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[39]
A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel
Y. A. Ghassabeh, “A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel”, Journal of Multivariate Analysis 135 (2015) 1–10
work page 2015
-
[40]
Interaction-force transport gradient flows
E. Gladin, P. Dvurechensky, A. Mielke, and J.-J. Zhu, “Interaction-force transport gradient flows”, in “Advances in Neural Information Processing Systems”, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, eds., vol. 37, pp. 14484–14508. Curran Associates, Inc., 2024. arXiv:2405.17075
-
[41]
KALE flow: A relaxed KL gradient flow for probabilities with disjoint support
P. Glaser, M. Arbel, and A. Gretton, “KALE flow: A relaxed KL gradient flow for probabilities with disjoint support”, Advances in Neural Information Processing Systems 34 (2021) 8018–8031, arXiv:2106.08929
-
[42]
Foundations of quantization for probability distributions
S. Graf and H. Luschgy, “Foundations of quantization for probability distributions”, Springer Science & Business Media, 2000
work page 2000
-
[43]
A kernel statistical test of independence
A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Sch¨ olkopf, and A. Smola, “A kernel statistical test of independence”, Advances in neural information processing systems 20 (2007)
work page 2007
-
[44]
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch¨ olkopf, and A. Smola, “A kernel two-sample test”, The Journal of Machine Learning Research 13 (2012), no. 1, 723–773
work page 2012
-
[45]
Positively weighted kernel quadrature via subsampling
S. Hayakawa, H. Oberhauser, and T. Lyons, “Positively weighted kernel quadrature via subsampling”, Advances in Neural Information Processing Systems 35 (2022) 6886–6900, arXiv:2107.09597
-
[46]
Sampling-based Nystr¨ om approximation and kernel quadrature
S. Hayakawa, H. Oberhauser, and T. Lyons, “Sampling-based Nystr¨ om approximation and kernel quadrature”, in “International Conference on Machine Learning”, pp. 12678–12699, PMLR. 2023. 15
work page 2023
-
[47]
Generative sliced MMD flows with Riesz kernels
J. Hertrich, C. Wald, F. Altekr¨ uger, and P. Hagemann, “Generative sliced MMD flows with Riesz kernels”, in “The Twelfth International Conference on Learning Representations”. 2024. arXiv:2305.11463
-
[48]
Optimally-Weighted Herding is Bayesian Quadrature
F. Husz´ ar and D. Duvenaud, “Optimally–weighted herding is Bayesian quadrature”, arXiv preprint, 2012 arXiv:1204.1664
work page internal anchor Pith review Pith/arXiv arXiv 2012
-
[49]
The variational formulation of the Fokker–Planck equation
R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulation of the Fokker–Planck equation”, SIAM journal on mathematical analysis 29 (1998), no. 1, 1–17
work page 1998
-
[50]
Fully symmetric kernel quadrature
T. Karvonen and S. S¨ arkk¨ a, “Fully symmetric kernel quadrature”,SIAM Journal on Scientific Computing 40 (2018), no. 2, A697–A720, arXiv:1703.06359
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[51]
Gaussian kernel quadrature at scaled Gauss-Hermite nodes
T. Karvonen and S. S¨ arkk¨ a, “Gaussian kernel quadrature at scaled Gauss–Hermite nodes”, BIT Numerical Mathematics 59 (2019), no. 4, 877–902, arXiv:1803.09532
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[52]
Kernel-based interpolation at approximate Fekete points
T. Karvonen, S. S¨ arkk¨ a, and K. Tanaka, “Kernel-based interpolation at approximate Fekete points”, Numerical Algorithms 87 (2021) 445–468, arXiv:1912.07316
-
[53]
On the positivity and magnitudes of Bayesian quadrature weights
T. Karvonen, M. Kanagawa, and S. S¨ arkk¨ a, “On the positivity and magnitudes of Bayesian quadrature weights”, Statistics and Computing 29 (2019) 1317–1333, arXiv:1812.08509
-
[54]
Numerical methods for nonlinear equations
C. T. Kelley, “Numerical methods for nonlinear equations”, Acta Numerica 27 (2018) 207–287
work page 2018
-
[55]
Exponential rate of convergence for Lloyd’s method I
J. Kieffer, “Exponential rate of convergence for Lloyd’s method I”, IEEE Transactions on Information Theory 28 (1982), no. 2, 205–210
work page 1982
-
[56]
A new optimal transport distance on the space of finite Radon measures
S. Kondratyev, L. Monsaingeon, and D. Vorotnikov, “A new optimal transport distance on the space of finite Radon measures”, Advances in Differential Equations 21 November (2016) arXiv:1505.07746
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[57]
Kernel Stein discrepancy descent
A. Korba, P. Aubin-Frankowski, S. Majewski, and P. Ablin, “Kernel Stein discrepancy descent”, in “International Conference on Machine Learning”, pp. 5719–5730, PMLR. 2021. arXiv:2105.09994
-
[58]
Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering
S. Lacoste-Julien, F. Lindsten, and F. Bach, “Sequential kernel herding: Frank-Wolfe optimization for particle filtering”, in “Artificial Intelligence and Statistics”, pp. 544–552, PMLR. 2015. arXiv:1501.02056
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[59]
Numba: A LLVM-based Python JIT compiler
S. K. Lam, A. Pitrou, and S. Seibert, “Numba: A LLVM-based Python JIT compiler”, in “Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC”, pp. 1–6. 2015
work page 2015
-
[60]
MNIST handwritten digit database
Y. LeCun, C. Cortes, and C. Burges, “MNIST handwritten digit database”, ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010)
work page 2010
-
[61]
MMD GAN: Towards Deeper Understanding of Moment Matching Network
C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. P´ oczos, “MMD GAN: Towards deeper understanding of moment matching network”, Advances in neural information processing systems 30 (2017) arXiv:1705.08584
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[62]
A note on the convergence of the mean shift
X. Li, Z. Hu, and F. Wu, “A note on the convergence of the mean shift”, Pattern recognition 40 (2007), no. 6, 1756–1762. 16
work page 2007
-
[63]
M. Liero, A. Mielke, and G. Savar´ e, “Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures”, Inventiones mathematicae 211 (2018), no. 3, 969–1117, arXiv:1508.07941
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[64]
Stein variational gradient descent: A general purpose Bayesian inference algorithm
Q. Liu and D. Wang, “Stein variational gradient descent: A general purpose Bayesian inference algorithm”, Advances in neural information processing systems 29 (2016) arXiv:1608.04471
-
[65]
Birth–death dynamics for sampling: global convergence, approximations and their asymptotics
Y. Lu, D. Slepˇ cev, and L. Wang, “Birth–death dynamics for sampling: global convergence, approximations and their asymptotics”, Nonlinearity 36 (2023), no. 11, 5731, arXiv:2211.00450
-
[66]
Accelerating Langevin Sampling with Birth-death
Y. Lu, J. Lu, and J. Nolen, “Accelerating Langevin sampling with birth-death”, arXiv preprint, 2019 arXiv:1905.09863
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[67]
Sampling in unit time with kernel Fisher–Rao flow
A. Maurais and Y. Marzouk, “Sampling in unit time with kernel Fisher–Rao flow”, in “Proceedings of the 41st International Conference on Machine Learning”, vol. 235 of Proceedings of Machine Learning Research, pp. 35138–35162. PMLR, 21–27 Jul 2024. arXiv:2401.03892
-
[68]
Kernel mean embedding of distributions: A review and beyond
K. Muandet, K. Fukumizu, B. Sriperumbudur, B. Sch¨ olkopf,et al., “Kernel mean embedding of distributions: A review and beyond”, Foundations and Trends® in Machine Learning 10 (2017), no. 1-2, 1–141, arXiv:1605.09522
- [69]
-
[70]
J. Oettershagen, “Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification”, Verlag Dr. Hut, 2017
work page 2017
-
[71]
The geometry of dissipative evolution equations: the porous medium equation
F. Otto, “The geometry of dissipative evolution equations: the porous medium equation”, Communications in Partial Differential Equations , 2001
work page 2001
-
[72]
Statistically efficient thinning of a Markov chain sampler
A. B. Owen, “Statistically efficient thinning of a Markov chain sampler”, Journal of Computational and Graphical Statistics 26 (2017), no. 3, 738–744, arXiv:1510.07727
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[73]
Pointwise convergence of the Lloyd algorithm in higher dimension
G. Pag` es and J. Yu, “Pointwise convergence of the Lloyd algorithm in higher dimension”, SIAM Journal on Control and Optimization 54 (2016), no. 5, 2354–2382, arXiv:1401.0192
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[74]
Computational optima l transport
G. Peyr´ e and M. Cuturi, “Computational optimal transport: With applications to data science”, Foundations and Trends® in Machine Learning 11 (2019), no. 5-6, 355–607, arXiv:1803.00567
-
[75]
n-Widths in Approximation Theory
A. Pinkus, “n-Widths in Approximation Theory”, Springer Science & Business Media, 2012
work page 2012
-
[76]
On the sequential convergence of Lloyd's algorithms
L. Portales, E. Cazelles, and E. Pauwels, “On the sequential convergence of Lloyd’s algorithms”, arXiv preprint, 2024 arXiv:2405.20744
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[77]
Interactive supercomputing on 40,000 cores for machine learning and data analysis
A. Reuther, J. Kepner, C. Byun, S. Samsi, W. Arcand, D. Bestor, B. Bergeron, V. Gadepally, M. Houle, M. Hubbell, M. Jones, A. Klein, L. Milechin, J. Mullen, A. Prout, A. Rosa, C. Yee, and P. Michaleas, “Interactive supercomputing on 40,000 cores for machine learning and data analysis”, in “2018 IEEE High Performance extreme Computing Conference (HPEC)”, p...
work page 2018
-
[78]
Optimal thinning of MCMC output
M. Riabiz, W. Y. Chen, J. Cockayne, P. Swietach, S. A. Niederer, L. Mackey, and C. J. Oates, “Optimal thinning of MCMC output”, Journal of the Royal Statistical Society Series B: Statistical Methodology 84 (2022), no. 4, 1059–1081, arXiv:2005.03952. 17
-
[79]
Monte Carlo statistical methods
C. P. Robert, G. Casella, and G. Casella, “Monte Carlo statistical methods”, Springer, 1999
work page 1999
-
[80]
Global convergence of neuron birth-death dynamics
G. Rotskoff, S. Jelassi, J. Bruna, and E. Vanden-Eijnden, “Global convergence of neuron birth-death dynamics”, in “International Conference on Machine Learning”. 2019. arXiv:1902.01843
work page internal anchor Pith review Pith/arXiv arXiv 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.