Fast Computation of Free-Support Wasserstein Medians
Pith reviewed 2026-06-26 18:19 UTC · model grok-4.3
The pith
A direct fixed-weight relocation using exact OT projections computes free-support Wasserstein medians without nested barycenter solves.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The relocation rule that sends each current support atom to the inverse-distance-weighted average of its barycentric projections obtained from exact optimal transport plans to the input measures is the exact minimizer of a tight majorization-minimization surrogate for the smoothed median objective; this surrogate property supplies monotone descent on exact transport subproblems together with convex-hull invariance, a finite-time best-residual convergence rate, residual-to-gradient control when the objective is differentiable, and fixed-point and stationarity characterizations.
What carries the argument
the inverse-distance-weighted average of barycentric projections obtained from exact optimal transport plans to the input measures
If this is right
- Monotone descent holds on every exact transport subproblem.
- The support of the median stays inside the convex hull of the input supports.
- A finite-time best-residual convergence rate is obtained.
- Under differentiability the residual controls the gradient norm.
- Fixed points of the iteration are stationary for the smoothed objective.
Where Pith is reading between the lines
- The resolution-consistency result implies that the same fixed-weight rule can be applied at successively finer discretizations without re-tuning weights.
- The stability analysis suggests the method remains useful for online aggregation when new measures arrive one at a time.
- Because only exact transport subproblems are required, the approach can be paired with any fast exact OT solver without having to implement a nested barycenter routine.
Load-bearing premise
The fixed-weight approximation to the true median objective remains close enough to the variable-weight optimum that the computational savings still produce accurate medians.
What would settle it
A benchmark instance in which the direct solver's final objective value is shown to exceed the value attained by a nested Weiszfeld solver by more than a small tolerance when the same number of support points and the same smoothing parameter are used.
read the original abstract
The Wasserstein median is a robust alternative to the Wasserstein barycenter for averaging probability measures, but exact empirical computation can be expensive. A natural metric-space Weiszfeld scheme updates the current candidate by solving a weighted Wasserstein barycenter problem at each outer iteration, producing a nested optimization problem. We propose a direct fixed-weight free-support solver that avoids this inner barycenter loop. At each iteration, the method solves exact optimal transport (OT) subproblems from the current candidate to the input measures, computes barycentric projections of the selected plans, and relocates each support atom to an inverse-distance-weighted average of its projected destinations. For a smoothed median objective, we show that this relocation is the exact minimizer of a tight majorization--minimization surrogate. This yields monotone descent for exact transport subproblems, convex-hull invariance, a finite-time best-residual rate, residual-to-gradient control under differentiability, and fixed-point and stationarity characterizations. We also give smoothing, stability, and resolution-consistency results clarifying the fixed-weight approximation. In exact-OT benchmarks, the direct solver attains median objectives close to tightly solved nested Weiszfeld baselines while using substantially fewer exact transport subproblems. Additional contamination, posterior aggregation, and image-prototype experiments show that the direct solver produces median summaries comparable to nested computation and less sensitive to outlying distributions than Wasserstein barycenters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a direct fixed-weight free-support solver for Wasserstein medians. At each iteration it solves exact OT subproblems from the current candidate to the input measures, computes barycentric projections, and relocates each support atom to an inverse-distance-weighted average of its projected destinations. For a smoothed median objective the relocation is shown to be the exact minimizer of a tight MM surrogate; this yields monotone descent (even with exact transport), convex-hull invariance, a finite-time best-residual rate, residual-to-gradient control, and fixed-point/stationarity characterizations. Separate smoothing, stability, and resolution-consistency results are given to justify the fixed-weight approximation. In exact-OT benchmarks the method attains objective values close to nested Weiszfeld baselines while using substantially fewer transport subproblems; additional experiments on contamination, posterior aggregation, and image prototypes show comparable summaries that are less sensitive to outliers than barycenters.
Significance. If the MM-surrogate derivation and the listed descent/invariance/stationarity properties hold, the work supplies an efficient, theoretically grounded algorithm for free-support Wasserstein medians that avoids an inner barycenter loop. The explicit credit given to monotone descent for exact OT subproblems, convex-hull invariance, and the finite-time rate is a strength. The stability results, even if only qualitative, help bridge the smoothed analysis to the exact-OT setting that is actually benchmarked.
major comments (2)
- [stability and resolution-consistency results (referenced after the MM analysis)] The central claim that the direct solver produces medians comparable to nested Weiszfeld baselines with fewer OT calls rests on the fixed-weight approximation being sufficiently close to the true (non-smooth) median objective. The smoothing/stability/resolution-consistency results are invoked for this justification, yet they appear to supply only qualitative or asymptotic control rather than explicit quantitative bounds relating the surrogate gap to the smoothing parameter and the number of atoms; without such bounds the monotone descent on the surrogate does not automatically guarantee closeness of the attained objective to the true median in the exact-OT regime.
- [exact-OT benchmarks section] § on empirical benchmarks: the reported closeness of median objectives to the tightly solved nested baselines is presented without an accompanying error table or plot that quantifies the gap attributable to the fixed-weight approximation versus the gap due to early termination; this makes it difficult to isolate whether the computational saving is obtained at the cost of a measurable increase in objective value.
minor comments (2)
- [Abstract] Abstract: the phrase 'finite-time best-residual rate' is not immediately clear; a brief parenthetical gloss on what 'best-residual' denotes would help readers.
- [smoothing parameter definition] Notation: the smoothing parameter is introduced as a free parameter; its dependence (or independence) on the number of atoms should be stated explicitly when the resolution-consistency result is presented.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address the major comments point by point below and will incorporate revisions to improve clarity and empirical presentation.
read point-by-point responses
-
Referee: The central claim that the direct solver produces medians comparable to nested Weiszfeld baselines with fewer OT calls rests on the fixed-weight approximation being sufficiently close to the true (non-smooth) median objective. The smoothing/stability/resolution-consistency results are invoked for this justification, yet they appear to supply only qualitative or asymptotic control rather than explicit quantitative bounds relating the surrogate gap to the smoothing parameter and the number of atoms; without such bounds the monotone descent on the surrogate does not automatically guarantee closeness of the attained objective to the true median in the exact-OT regime.
Authors: We agree that the smoothing, stability, and resolution-consistency results provide primarily qualitative and asymptotic justification rather than explicit quantitative bounds on the surrogate gap as a function of the smoothing parameter and atom count. The monotone descent property therefore does not yield a rigorous guarantee of closeness to the exact median objective. In the revision we will add an explicit remark acknowledging this limitation of the current analysis and discuss its implications for interpreting the exact-OT benchmarks. revision: yes
-
Referee: § on empirical benchmarks: the reported closeness of median objectives to the tightly solved nested baselines is presented without an accompanying error table or plot that quantifies the gap attributable to the fixed-weight approximation versus the gap due to early termination; this makes it difficult to isolate whether the computational saving is obtained at the cost of a measurable increase in objective value.
Authors: We concur that an explicit quantification of the objective gaps would help readers separate the contribution of the fixed-weight approximation from early termination effects. We will revise the empirical section to include a table (or supplementary plot) reporting objective values for both solvers together with the number of OT subproblems used, thereby clarifying the observed trade-off. revision: yes
Circularity Check
No significant circularity; derivation relies on independent MM surrogate analysis
full rationale
The paper derives the fixed-weight relocation update as the exact minimizer of a majorization-minimization surrogate constructed for a smoothed median objective, then establishes monotone descent and related properties from that surrogate. This chain depends on external exact OT subproblems and standard MM arguments rather than reducing any claimed result to a fitted parameter, self-referential definition, or load-bearing self-citation. Smoothing/stability results address the fixed-weight approximation separately without circular reduction. The central algorithmic claim therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- smoothing parameter
axioms (1)
- standard math Optimal transport plans exist between the candidate and input measures and can be solved exactly
Reference graph
Works this paper leans on
-
[1]
Danskin, John M. , editor =. The. doi:10.1007/978-3-642-46092-0 , urldate =
-
[2]
Mathematical Programming , author =
Proximal alternating linearized minimization for nonconvex and nonsmooth problems , volume =. Mathematical Programming , author =. 2014 , pages =. doi:10.1007/s10107-013-0701-9 , language =
-
[3]
Mathematical Programming , author =
Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized. Mathematical Programming , author =. 2013 , pages =. doi:10.1007/s10107-011-0484-9 , language =
-
[4]
Les équations aux dérivées partielles , author =
Une propriété topologique des sous-ensembles analytiques réels , volume =. Les équations aux dérivées partielles , author =
-
[5]
USSR Computational Mathematics and Mathematical Physics , author =
Gradient methods for the minimisation of functionals , volume =. USSR Computational Mathematics and Mathematical Physics , author =. 1963 , pages =. doi:10.1016/0041-5553(63)90382-3 , language =
-
[6]
Mathematical Programming , author =
A note on. Mathematical Programming , author =. 1973 , pages =. doi:10.1007/BF01584648 , language =
-
[7]
Learning multiple layers of features from tiny images , institution =
Krizhevsky, Alex and Hinton, Geoffrey , year =. Learning multiple layers of features from tiny images , institution =
-
[8]
Topics in
Villani, Cédric , year =. Topics in
-
[9]
Villani, Cédric , year =. Optimal
-
[10]
Probability measures on metric spaces of nonpositive curvature , volume =
Sturm, Karl-Theodor , editor =. Probability measures on metric spaces of nonpositive curvature , volume =. Contemporary. 2003 , pages =. doi:10.1090/conm/338/06080 , language =
-
[11]
Computational. Foundations and Trends® in Machine Learning , author =. 2019 , pages =. doi:10.1561/2200000073 , language =
-
[12]
Optimal transport mapping via input convex neural networks , volume =
Makkuva, Ashok and Taghvaei, Amirhossein and Oh, Sewoong and Lee, Jason , editor =. Optimal transport mapping via input convex neural networks , volume =. Proceedings of the 37th international conference on machine learning , publisher =. 2020 , pages =
2020
-
[13]
Scalable
Fan, Amirhossein, Jiaojiao, Taghvaei and Chen, Yongxin , editor =. Scalable. Proceedings of the 38th international conference on machine learning , publisher =. 2021 , pages =
2021
-
[14]
Manole, Tudor and Balakrishnan, Sivaraman and Niles-Weed, Jonathan and Wasserman, Larry , year =. Plugin. doi:10.48550/ARXIV.2107.12364 , urldate =
-
[15]
On the. Management Science , author =. 1958 , pages =. doi:10.1287/mnsc.5.1.1 , language =
-
[16]
Uscidda, Théo and Cuturi, Marco , month = feb, year =. The
-
[17]
The Annals of Statistics , author =
Minimax estimation of smooth optimal transport maps , volume =. The Annals of Statistics , author =. doi:10.1214/20-AOS1997 , number =
-
[18]
Communications on Pure and Applied Mathematics , author =
Polar factorization and monotone rearrangement of vector‐valued functions , volume =. Communications on Pure and Applied Mathematics , author =. 1991 , pages =. doi:10.1002/cpa.3160440402 , language =
-
[19]
doi:10.21227/KATB-ZV89 , abstract =
Brunner, Clemens and Leeb, Robert and Müller-Putz, Gernot , year =. doi:10.21227/KATB-ZV89 , abstract =
-
[20]
Fan, Jiaojiao and Liu, Shu and Ma, Shaojun and Zhou, Haomin and Chen, Yongxin , month = nov, year =. Neural
-
[21]
IMA Journal of Numerical Analysis , author =
Quantitative stability and error estimates for optimal transport plans , volume =. IMA Journal of Numerical Analysis , author =. 2021 , pages =. doi:10.1093/imanum/draa045 , abstract =
-
[22]
SIAM Journal on Mathematical Analysis , author =
Minimizing. SIAM Journal on Mathematical Analysis , author =. 2003 , pages =. doi:10.1137/S0036141002410927 , language =
-
[23]
Mémoire sur la théorie des déblais et des remblais , publisher =
Monge, Gaspard , year =. Mémoire sur la théorie des déblais et des remblais , publisher =
-
[24]
arXiv:2103.04621 [math] , author =
Geodesic of the. arXiv:2103.04621 [math] , author =. 2021 , keywords =
arXiv 2021
-
[25]
International Journal of Computer Vision , author =
A. International Journal of Computer Vision , author =. 2006 , pages =. doi:10.1007/s11263-005-3222-z , language =
-
[26]
Journal of Mathematical Imaging and Vision , author =
Intrinsic. Journal of Mathematical Imaging and Vision , author =. 2006 , pages =. doi:10.1007/s10851-006-6228-4 , language =
-
[27]
Mapping estimation for discrete optimal transport , volume =
Perrot, Michaël and Courty, Nicolas and Flamary, Rémi and Habrard, Amaury , editor =. Mapping estimation for discrete optimal transport , volume =. Advances in
-
[28]
Joint distribution optimal transportation for domain adaptation , volume =
Courty, Nicolas and Flamary, Rémi and Habrard, Amaury and Rakotomamonjy, Alain , editor =. Joint distribution optimal transportation for domain adaptation , volume =. Advances in
-
[29]
IEEE Transactions on Pattern Analysis and Machine Intelligence , author =
Optimal. IEEE Transactions on Pattern Analysis and Machine Intelligence , author =. 2017 , pages =. doi:10.1109/TPAMI.2016.2615921 , number =
-
[30]
Wasserstein discriminant analysis , volume =. Machine Learning , author =. 2018 , pages =. doi:10.1007/s10994-018-5717-1 , language =
-
[31]
Bonet, Clément and Malézieux, Benoît and Rakotomamonjy, Alain and Drumetz, Lucas and Moreau, Thomas and Kowalski, Matthieu and Courty, Nicolas , month = mar, year =. Sliced-
-
[32]
Journal of Neural Engineering , author =. 2018 , pages =. doi:10.1088/1741-2552/aadea0 , number =
-
[33]
Frontiers in Neuroscience , author =
Review of the. Frontiers in Neuroscience , author =. doi:10.3389/fnins.2012.00055 , urldate =
-
[34]
Mathematical Programming , author =
An alternative to. Mathematical Programming , author =. 2020 , pages =. doi:10.1007/s10107-019-01381-4 , language =
-
[35]
2019 , keywords =
Applied directional statistics , isbn =. 2019 , keywords =
2019
-
[36]
Proceedings of the American Mathematical Society , author =
Positive definite matrices and the. Proceedings of the American Mathematical Society , author =. 2015 , pages =. doi:10.1090/proc/12953 , language =
-
[37]
Computational Statistics , author =
A short note on parameter approximation for von. Computational Statistics , author =. 2012 , pages =. doi:10.1007/s00180-011-0232-x , language =
-
[38]
A new metric on the manifold of kernel matrices with application to matrix geometric means , volume =
Sra, Suvrit , editor =. A new metric on the manifold of kernel matrices with application to matrix geometric means , volume =. Advances in neural information processing systems , publisher =
-
[39]
Efficient similarity search for covariance matrices via the
Cherian, Anoop and Sra, Suvrit and Banerjee, Arindam and Papanikolopoulos, Nikolaos , month = nov, year =. Efficient similarity search for covariance matrices via the. 2011. doi:10.1109/ICCV.2011.6126523 , urldate =
-
[40]
Journal of Machine Learning Research , author =
Clustering on the unit hypersphere using von mises-fisher distributions , volume =. Journal of Machine Learning Research , author =. 2005 , pages =
2005
-
[41]
First-order
Zhang, Hongyi and Sra, Suvrit , editor =. First-order. 29th. 2016 , pages =
2016
-
[42]
SIAM Journal on Mathematical Analysis , author =
Linearized. SIAM Journal on Mathematical Analysis , author =. 2024 , pages =. doi:10.1137/23M1564535 , language =
-
[43]
Santambrogio, Filippo , year =. Optimal. doi:10.1007/978-3-319-20828-2 , language =
-
[44]
Communications on Pure and Applied Mathematics , author =
Riemannian. Communications on Pure and Applied Mathematics , author =. 1977 , pages =. doi:10.1002/cpa.3160300502 , language =
-
[45]
Van Den Bosch, Jasper Jf and Golan, Tal and Peters, Benjamin and Taylor, JohnMark and Shahbazi, Mahdiyar and Lin, Baihan and Charest, Ian and Diedrichsen, Jörn and Kriegeskorte, Nikolaus and Mur, Marieke and Schütt, Heiko H , month = sep, year =. A. doi:10.7554/eLife.107828.1 , abstract =
-
[46]
Cheng, Zixiong and Liu, Hang , year =. Robust. doi:10.48550/ARXIV.2603.07563 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.07563
-
[47]
Journal of the Royal Statistical Society Series B: Statistical Methodology , author =
Huber means on. Journal of the Royal Statistical Society Series B: Statistical Methodology , author =. 2026 , pages =. doi:10.1093/jrsssb/qkaf054 , language =
-
[48]
You, Kisung , year =. Finite. doi:10.48550/ARXIV.2604.24895 , urldate =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.24895
-
[49]
Michigan Mathematical Journal , author =
A class of. Michigan Mathematical Journal , author =. 1984 , note =. doi:10.1307/mmj/1029003026 , number =
-
[50]
, year =
Witkin, Andrew P. , year =. Scale-space filtering , booktitle =
-
[51]
Annual Review of Statistics and Its Application , author =
Topological. Annual Review of Statistics and Its Application , author =. 2018 , pages =. doi:10.1146/annurev-statistics-031017-100045 , abstract =
-
[52]
Wand, M.P. and Jones, M.C. , month = dec, year =. Kernel. doi:10.1201/b14876 , language =
-
[53]
The Annals of Statistics , author =
On nonparametric estimation of density level sets , volume =. The Annals of Statistics , author =. doi:10.1214/aos/1069362732 , number =
-
[54]
The Annals of Statistics , author =
Variable. The Annals of Statistics , author =. doi:10.1214/aos/1176348768 , number =
-
[55]
Journal of Computational and Graphical Statistics , author =
A. Journal of Computational and Graphical Statistics , author =. 2010 , pages =. doi:10.1198/jcgs.2009.07049 , language =
-
[56]
Journal of Classification , author =
Estimating the. Journal of Classification , author =. 2003 , pages =. doi:10.1007/s00357-003-0004-6 , number =
-
[57]
The Annals of Statistics , author =
Fully adaptive density-based clustering , volume =. The Annals of Statistics , author =. doi:10.1214/15-AOS1331 , number =
-
[58]
Information and Control , author =
Some inequalities satisfied by the quantities of information of. Information and Control , author =. 1959 , pages =. doi:10.1016/S0019-9958(59)90348-1 , language =
-
[59]
Score-Based Generative Modeling through Stochastic Differential Equations
Song, Yang and Sohl-Dickstein, Jascha and Kingma, Diederik P. and Kumar, Abhishek and Ermon, Stefano and Poole, Ben , year =. Score-. doi:10.48550/ARXIV.2011.13456 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2011.13456 2011
-
[60]
Generative modeling by estimating gradients of the data distribution , abstract =
Song, Yang and Ermon, Stefano , year =. Generative modeling by estimating gradients of the data distribution , abstract =. Proceedings of the 33rd
-
[61]
Sohl-Dickstein, Jascha and Weiss, Eric and Maheswaranathan, Niru and Ganguli, Surya , editor =. Deep. Proceedings of the 32nd. 2015 , pages =
2015
-
[62]
Local-. Cerebral Cortex , author =. 2018 , pages =. doi:10.1093/cercor/bhx179 , language =
-
[63]
Silverman, B.W. , month = feb, year =. Density. doi:10.1201/9781315140919 , language =
-
[64]
Journal of the Royal Statistical Society Series B: Statistical Methodology , author =
Using. Journal of the Royal Statistical Society Series B: Statistical Methodology , author =. 1981 , pages =. doi:10.1111/j.2517-6161.1981.tb01155.x , abstract =
-
[65]
From. Technometrics , author =. 2001 , pages =. doi:10.1198/004017001316975916 , language =
-
[66]
Scott, David W. , month = mar, year =. Multivariate. doi:10.1002/9781118575574 , language =
-
[67]
Journal d'Analyse Mathématique , author =
On. Journal d'Analyse Mathématique , author =. 1951 , pages =. doi:10.1007/BF02790092 , language =
-
[68]
The Annals of Statistics , author =
Asymptotics and optimal bandwidth selection for highest density region estimation , volume =. The Annals of Statistics , author =. doi:10.1214/09-AOS766 , number =
-
[69]
Probability of indecomposability of a random mapping function.Ann
Remarks on. The Annals of Mathematical Statistics , author =. 1956 , pages =. doi:10.1214/aoms/1177728190 , language =
-
[70]
The Annals of Statistics , author =
Generalized density clustering , volume =. The Annals of Statistics , author =. doi:10.1214/10-AOS797 , number =
-
[71]
Optimal rates for plug-in estimators of density level sets , volume =. Bernoulli , author =. doi:10.3150/09-BEJ184 , number =
-
[72]
Journal of the Royal Statistical Society Series B: Statistical Methodology , author =
On. Journal of the Royal Statistical Society Series B: Statistical Methodology , author =. 1997 , pages =. doi:10.1111/1467-9868.00095 , abstract =
-
[73]
The Annals of Statistics , author =
The topography of multivariate normal mixtures , volume =. The Annals of Statistics , author =. doi:10.1214/009053605000000417 , number =
-
[74]
Computational Statistics & Data Analysis , author =
Alternating kernel and mixture density estimates , volume =. Computational Statistics & Data Analysis , author =. 2000 , pages =. doi:10.1016/S0167-9473(00)00003-7 , language =
-
[75]
The Annals of Statistics , author =
Measuring. The Annals of Statistics , author =. doi:10.1214/aos/1176324626 , number =
-
[76]
and Wang, Bei and Zheng, Yan , editor =
Phillips, Jeff M. and Wang, Bei and Zheng, Yan , editor =. Geometric. 2015 , pages =. doi:10.4230/LIPICS.SOCG.2015.857 , booktitle =
-
[77]
The Annals of Mathematical Statistics , author =
On. The Annals of Mathematical Statistics , author =. 1962 , pages =. doi:10.1214/aoms/1177704472 , language =
-
[78]
Journal of Computational and Graphical Statistics , author =
The. Journal of Computational and Graphical Statistics , author =. 1993 , pages =. doi:10.1080/10618600.1993.10474599 , language =
-
[79]
The Annals of Statistics , author =
Nonparametric testing of the existence of modes , volume =. The Annals of Statistics , author =. doi:10.1214/aos/1031594735 , number =
-
[80]
, year =
Lindsay, Bruce G. , year =. Mixture models: theory, geometry, and applications , isbn =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.