Metric-valued regression
Pith reviewed 2026-05-24 12:33 UTC · model grok-4.3
The pith
Metric medoids yield the first strongly Bayes-consistent regressor for mappings between arbitrary separable metric spaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that metric medoids produce a procedure that is strongly Bayes-consistent for regression between topologically separable metric spaces X and Y when Y is bounded in expectation. At this level of generality the result is the first to cover unbounded loss in the agnostic setting, and existing methods are shown not to achieve consistency on general metrics.
What carries the argument
Metric medoids, a variant of Fréchet means that selects the training point minimizing average distance to the others in the sample, serve as the estimator replacing standard empirical risk minimization.
If this is right
- Existing methods based on standard empirical risk minimization fail to achieve Bayes consistency on general instance and label metrics.
- The algorithm is efficient and applies whenever the two spaces satisfy the stated separability and boundedness conditions.
- Semi-stable compression supplies a new proof device that may be reused for other consistency arguments.
- Strong consistency holds for unbounded loss, removing a restriction that previously limited theory to bounded or Euclidean cases.
Where Pith is reading between the lines
- The same medoid construction could be tested on outputs taking values in graphs or other structured metric spaces not covered by Euclidean assumptions.
- Semi-stable compression might adapt to prove consistency results in related settings such as clustering or nearest-neighbor rules.
- Removing the bounded-expectation condition on Y would require a different estimator or truncation argument.
Load-bearing premise
Topological separability of both the input and output metric spaces together with bounded expectation on the output space.
What would settle it
A concrete pair of non-separable metric spaces on which the proposed procedure fails to converge in probability to the Bayes regressor.
read the original abstract
We propose an efficient algorithm for learning mappings between two metric spaces, $\X$ and $\Y$. Our procedure is strongly Bayes-consistent whenever $\X$ and $\Y$ are topologically separable and $\Y$ is "bounded in expectation" (our term; the separability assumption can be somewhat weakened). At this level of generality, ours is the first such learnability result for unbounded loss in the agnostic setting. Our technique is based on metric medoids (a variant of Fr\'echet means) and presents a significant departure from existing methods, which, as we demonstrate, fail to achieve Bayes-consistency on general instance- and label-space metrics. Our proofs introduce the technique of {\em semi-stable compression}, which may be of independent interest.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an algorithm for regression between metric spaces X and Y based on metric medoids (a variant of Fréchet means). It claims that the procedure is strongly Bayes-consistent whenever X and Y are topologically separable and Y is bounded in expectation; this is presented as the first such learnability result for unbounded loss in the agnostic setting. The proofs rely on a new technique called semi-stable compression. The authors also argue that prior methods fail to achieve Bayes-consistency on general instance- and label-space metrics.
Significance. If the consistency theorem holds under the stated topological and moment conditions, the result would be a notable contribution to statistical learning theory for structured outputs, extending consistency guarantees beyond bounded-loss or Euclidean settings. The semi-stable compression technique is highlighted as potentially reusable. The claim of being first at this level of generality would strengthen the significance, provided the comparison to existing work is accurate and the proofs are free of hidden restrictions.
major comments (2)
- [Abstract / §1] The abstract states the consistency result but does not indicate the precise form of the estimator or the compression scheme; without the full derivation (e.g., the definition of the medoid-based predictor and how semi-stable compression is applied to the risk), it is impossible to verify that the stated conditions on separability and bounded expectation are both necessary and sufficient rather than artifacts of the proof technique.
- [Introduction / §2] The claim that existing methods 'fail to achieve Bayes-consistency on general instance- and label-space metrics' requires explicit counter-examples or a theorem showing divergence; if this is only demonstrated for specific choices of metric (rather than arbitrary metrics), the contrast with the new result is weakened.
minor comments (2)
- [Abstract] The term 'bounded in expectation' is introduced without a formal definition in the abstract; a precise statement (e.g., E[d(Y,y0)] < ∞ for some y0) should appear early in the main text.
- [Abstract] The manuscript should clarify whether the topological separability assumption can be relaxed to separability of the supports of the marginal and conditional distributions rather than the entire spaces.
Simulated Author's Rebuttal
We thank the referee for their review and for raising these points on clarity and comparison to prior work. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract / §1] The abstract states the consistency result but does not indicate the precise form of the estimator or the compression scheme; without the full derivation (e.g., the definition of the medoid-based predictor and how semi-stable compression is applied to the risk), it is impossible to verify that the stated conditions on separability and bounded expectation are both necessary and sufficient rather than artifacts of the proof technique.
Authors: The abstract is a high-level summary. The metric medoid estimator is formally defined in Section 3, and Section 4 derives the semi-stable compression argument that yields strong Bayes-consistency precisely when X and Y are separable and Y has bounded expectation. These conditions are sufficient for the argument and arise directly from controlling the risk deviation for unbounded losses on general metrics; the paper does not claim they are necessary in every conceivable setting. The full derivation is in the body, so verification is possible from the manuscript as submitted. revision: partial
-
Referee: [Introduction / §2] The claim that existing methods 'fail to achieve Bayes-consistency on general instance- and label-space metrics' requires explicit counter-examples or a theorem showing divergence; if this is only demonstrated for specific choices of metric (rather than arbitrary metrics), the contrast with the new result is weakened.
Authors: Section 2 supplies explicit counterexamples showing that standard estimators (including direct Fréchet-mean and certain kernel or embedding-based procedures) diverge from the Bayes predictor on general separable metrics when the loss is unbounded. The constructions are metric-agnostic in the sense that they apply to any metric spaces satisfying the separability assumption; they are not restricted to particular choices such as Euclidean or discrete metrics. This supports the stated contrast with our result. revision: no
Circularity Check
No significant circularity; derivation self-contained under stated assumptions
full rationale
The paper's central claim is a Bayes-consistency theorem for a metric-medoid procedure, conditioned explicitly on topological separability of X and Y plus Y bounded in expectation. The abstract and description present this as a new result at the stated level of generality, with the proof technique (semi-stable compression) introduced as potentially independent. No equations or steps in the provided material reduce a prediction or uniqueness claim to a fitted parameter or self-citation by construction. The result is externally falsifiable via the stated assumptions and does not rely on load-bearing self-citations or ansatzes smuggled from prior work. This is the normal case of a self-contained theoretical derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption X and Y are topologically separable
- domain assumption Y is bounded in expectation
Reference graph
Works this paper leans on
-
[1]
Y. Ashlagi, L. Gottlieb, and A. Kontorovich. Functions with average smoothness: structure, algorithms, and learning. In M. Belkin and S. Kpotufe, editors, Conference on Learning Theory, COLT 2021, 15-19 August 2021, Boulder, Colorado, USA , volume 134 of Proceedings of Machine Learning Research, pages 186--236. PMLR , 2021. URL http://proceedings.mlr.pres...
work page 2021
-
[2]
T. Z. Baharav and D. Tse. Ultra fast medoid identification via correlated sequential halving. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch \' e - Buc, E. B. Fox, and R. Garnett, editors, NeurIPS, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/c4de8ced6214345614d33fb0b16a8acd-Abstract.html
work page 2019
-
[3]
R. Bardenet and O.-A. Maillard. Concentration inequalities for sampling without replacement . Bernoulli, 21 0 (3): 0 1361 -- 1385, 2015. doi:10.3150/14-BEJ605. URL https://doi.org/10.3150/14-BEJ605
- [4]
- [5]
-
[6]
M. Blanchard and R. Cosson. Universal online learning with bounded loss: Reduction to binary classification. 2021
work page 2021
-
[7]
M. Blanchard, R. Cosson, and S. Hanneke. Universal online learning with unbounded losses: Memory is all you need. 2021
work page 2021
-
[8]
O. Bousquet, S. Hanneke, S. Moran, and N. Zhivotovskiy. Proper learning, helly number, and an optimal SVM bound. In J. D. Abernethy and S. Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria] , volume 125 of Proceedings of Machine Learning Research, pages 582--609. PMLR , 2020. URL http://proceedings.ml...
work page 2020
-
[9]
M. V. der Laan, K. Pollard, and J. Bryan. A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73 0 (8): 0 575--584, 2003. doi:10.1080/0094965031000136012. URL https://doi.org/10.1080/0094965031000136012
-
[10]
S. N. Evans and A. Q. Jaffe. Strong laws of large numbers for Fr\'echet means, 2020
work page 2020
-
[11]
F. Ferraty, A. Laksaci, A. Tadj, and P. Vieu. Kernel regression with functional response . Electronic Journal of Statistics, 5 0 (none): 0 159 -- 171, 2011. doi:10.1214/11-EJS600. URL https://doi.org/10.1214/11-EJS600
-
[12]
P. Fraigniaud, E. Lebhar, and L. Viennot. The inframetric model for the internet. In IEEE INFOCOM 2008 - The 27th Conference on Computer Communications, pages 1085--1093, 2008. doi:10.1109/INFOCOM.2008.163
-
[13]
L. Gottlieb, A. Kontorovich, and R. Krauthgamer. Efficient classification for metric data (extended abstract COLT 2010). IEEE Transactions on Information Theory , 60 0 (9): 0 5750--5759, 2014. doi:10.1109/TIT.2014.2339840. URL http://dx.doi.org/10.1109/TIT.2014.2339840
-
[14]
L.-A. Gottlieb, A. Kontorovich, and R. Krauthgamer. Adaptive metric dimensionality reduction (extended abstract: ALT 2013). Theoretical Computer Science, pages 105--118, 2016
work page 2013
-
[15]
L.-A. Gottlieb, A. Kontorovich, and R. Krauthgamer. Efficient regression in metric spaces via approximate L ipschitz extension. IEEE Transactions on Information Theory , 63 0 (8): 0 4838--4849, 2017
work page 2017
-
[16]
L. Gy \"o rfi and R. Weiss. Universal consistency and rates of convergence of multiclass prototype algorithms in metric spaces. Journal of Machine Learning Research, 22 0 (151): 0 1--25, 2021. URL http://jmlr.org/papers/v22/20-1081.html
work page 2021
-
[17]
L. Gy\" o rfi, M. Kohler, A. Krzy\. z ak, and H. Walk. A Distribution-Free Theory of Nonparametric Regression. Springer-Verlag New York, Inc., 2002
work page 2002
-
[18]
S. Hanneke. Universally consistent online learning with arbitrarily dependent responses, preprint. 2021 a
work page 2021
-
[19]
S. Hanneke. Learning whenever learning is possible: Universal learning under general stochastic processes. Journal of Machine Learning Research, 22 0 (130): 0 1--116, 2021 b . URL http://jmlr.org/papers/v22/17-298.html
work page 2021
-
[20]
S. Hanneke and A. Kontorovich. Stable sample compression schemes: New applications and an optimal SVM margin bound. In V. Feldman, K. Ligett, and S. Sabato, editors, Algorithmic Learning Theory, 16-19 March 2021, Virtual Conference, Worldwide, volume 132 of Proceedings of Machine Learning Research, pages 697--721. PMLR , 2021. URL http://proceedings.mlr.p...
work page 2021
-
[21]
S. Hanneke, A. Kontorovich, S. Sabato, and R. Weiss. Universal Bayes consistency in metric spaces . The Annals of Statistics, 49 0 (4): 0 2129 -- 2150, 2021. doi:10.1214/20-AOS2029. URL https://doi.org/10.1214/20-AOS2029
-
[22]
A. Kontorovich and R. Weiss. A B ayes consistent 1- NN classifier. In Artificial Intelligence and Statistics (AISTATS 2015), 2014
work page 2015
-
[23]
S. Kpotufe and N. Verma. Time-accuracy tradeoffs in kernel prediction: Controlling prediction quality. J. Mach. Learn. Res., 18: 0 44:1--44:29, 2017. URL http://jmlr.org/papers/v18/16-538.html
work page 2017
-
[24]
A. Maurer and M. Pontil. Empirical bernstein bounds and sample variance penalization, 2009
work page 2009
-
[25]
E. Morvant, S. Ko c o, and L. Ralaivola. PAC -bayesian generalization bound on confusion matrix for multi-class classification. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012 . icml.cc / Omnipress, 2012. URL http://icml.cc/2012/papers/434.pdf
work page 2012
-
[26]
A. Naor. Metric embeddings and lipschitz extensions, 2015
work page 2015
-
[27]
A. Naor and S. Sheffield. Absolutely minimal lipschitz extension of tree-valued mappings. Mathematische Annalen, 354 0 (3): 0 1049--1078, Nov. 2012. ISSN 0025-5831. doi:10.1007/s00208-011-0753-1
-
[28]
J. Newling and F. Fleuret. A sub-quadratic exact medoid algorithm. In A. Singh and X. J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA , volume 54 of Proceedings of Machine Learning Research, pages 185--193. PMLR , 2017. URL http://proceedi...
work page 2017
-
[29]
D. Pollard. A user's guide to measure theoretic probability. Cambridge University Press, 2002
work page 2002
-
[30]
M. J. Schervish. Theory of statistics. Springer Series in Statistics. Springer-Verlag, New York, 1995. ISBN 0-387-94546-6. doi:10.1007/978-1-4612-4250-5. URL https://doi.org/10.1007/978-1-4612-4250-5
- [31]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.