pith. sign in

arxiv: 2202.03045 · v1 · submitted 2022-02-07 · 💻 cs.LG · stat.ML

Metric-valued regression

Pith reviewed 2026-05-24 12:33 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords metric regressionBayes consistencymetric spacesagnostic learningFréchet meanssemi-stable compressionlearnability
0
0 comments X

The pith

Metric medoids yield the first strongly Bayes-consistent regressor for mappings between arbitrary separable metric spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an algorithm that learns functions from one metric space to another and proves it is strongly Bayes-consistent whenever the spaces are topologically separable and the output space satisfies a bounded-in-expectation condition. This holds for unbounded loss functions in the agnostic setting, a regime where prior methods are shown to fail. The method departs from empirical risk minimization by using metric medoids as the estimator. Proofs rely on a new technique called semi-stable compression. A reader would care because it supplies the first general learnability guarantee for regression with metric-valued outputs.

Core claim

The central claim is that metric medoids produce a procedure that is strongly Bayes-consistent for regression between topologically separable metric spaces X and Y when Y is bounded in expectation. At this level of generality the result is the first to cover unbounded loss in the agnostic setting, and existing methods are shown not to achieve consistency on general metrics.

What carries the argument

Metric medoids, a variant of Fréchet means that selects the training point minimizing average distance to the others in the sample, serve as the estimator replacing standard empirical risk minimization.

If this is right

  • Existing methods based on standard empirical risk minimization fail to achieve Bayes consistency on general instance and label metrics.
  • The algorithm is efficient and applies whenever the two spaces satisfy the stated separability and boundedness conditions.
  • Semi-stable compression supplies a new proof device that may be reused for other consistency arguments.
  • Strong consistency holds for unbounded loss, removing a restriction that previously limited theory to bounded or Euclidean cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same medoid construction could be tested on outputs taking values in graphs or other structured metric spaces not covered by Euclidean assumptions.
  • Semi-stable compression might adapt to prove consistency results in related settings such as clustering or nearest-neighbor rules.
  • Removing the bounded-expectation condition on Y would require a different estimator or truncation argument.

Load-bearing premise

Topological separability of both the input and output metric spaces together with bounded expectation on the output space.

What would settle it

A concrete pair of non-separable metric spaces on which the proposed procedure fails to converge in probability to the Bayes regressor.

read the original abstract

We propose an efficient algorithm for learning mappings between two metric spaces, $\X$ and $\Y$. Our procedure is strongly Bayes-consistent whenever $\X$ and $\Y$ are topologically separable and $\Y$ is "bounded in expectation" (our term; the separability assumption can be somewhat weakened). At this level of generality, ours is the first such learnability result for unbounded loss in the agnostic setting. Our technique is based on metric medoids (a variant of Fr\'echet means) and presents a significant departure from existing methods, which, as we demonstrate, fail to achieve Bayes-consistency on general instance- and label-space metrics. Our proofs introduce the technique of {\em semi-stable compression}, which may be of independent interest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an algorithm for regression between metric spaces X and Y based on metric medoids (a variant of Fréchet means). It claims that the procedure is strongly Bayes-consistent whenever X and Y are topologically separable and Y is bounded in expectation; this is presented as the first such learnability result for unbounded loss in the agnostic setting. The proofs rely on a new technique called semi-stable compression. The authors also argue that prior methods fail to achieve Bayes-consistency on general instance- and label-space metrics.

Significance. If the consistency theorem holds under the stated topological and moment conditions, the result would be a notable contribution to statistical learning theory for structured outputs, extending consistency guarantees beyond bounded-loss or Euclidean settings. The semi-stable compression technique is highlighted as potentially reusable. The claim of being first at this level of generality would strengthen the significance, provided the comparison to existing work is accurate and the proofs are free of hidden restrictions.

major comments (2)
  1. [Abstract / §1] The abstract states the consistency result but does not indicate the precise form of the estimator or the compression scheme; without the full derivation (e.g., the definition of the medoid-based predictor and how semi-stable compression is applied to the risk), it is impossible to verify that the stated conditions on separability and bounded expectation are both necessary and sufficient rather than artifacts of the proof technique.
  2. [Introduction / §2] The claim that existing methods 'fail to achieve Bayes-consistency on general instance- and label-space metrics' requires explicit counter-examples or a theorem showing divergence; if this is only demonstrated for specific choices of metric (rather than arbitrary metrics), the contrast with the new result is weakened.
minor comments (2)
  1. [Abstract] The term 'bounded in expectation' is introduced without a formal definition in the abstract; a precise statement (e.g., E[d(Y,y0)] < ∞ for some y0) should appear early in the main text.
  2. [Abstract] The manuscript should clarify whether the topological separability assumption can be relaxed to separability of the supports of the marginal and conditional distributions rather than the entire spaces.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review and for raising these points on clarity and comparison to prior work. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract / §1] The abstract states the consistency result but does not indicate the precise form of the estimator or the compression scheme; without the full derivation (e.g., the definition of the medoid-based predictor and how semi-stable compression is applied to the risk), it is impossible to verify that the stated conditions on separability and bounded expectation are both necessary and sufficient rather than artifacts of the proof technique.

    Authors: The abstract is a high-level summary. The metric medoid estimator is formally defined in Section 3, and Section 4 derives the semi-stable compression argument that yields strong Bayes-consistency precisely when X and Y are separable and Y has bounded expectation. These conditions are sufficient for the argument and arise directly from controlling the risk deviation for unbounded losses on general metrics; the paper does not claim they are necessary in every conceivable setting. The full derivation is in the body, so verification is possible from the manuscript as submitted. revision: partial

  2. Referee: [Introduction / §2] The claim that existing methods 'fail to achieve Bayes-consistency on general instance- and label-space metrics' requires explicit counter-examples or a theorem showing divergence; if this is only demonstrated for specific choices of metric (rather than arbitrary metrics), the contrast with the new result is weakened.

    Authors: Section 2 supplies explicit counterexamples showing that standard estimators (including direct Fréchet-mean and certain kernel or embedding-based procedures) diverge from the Bayes predictor on general separable metrics when the loss is unbounded. The constructions are metric-agnostic in the sense that they apply to any metric spaces satisfying the separability assumption; they are not restricted to particular choices such as Euclidean or discrete metrics. This supports the stated contrast with our result. revision: no

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained under stated assumptions

full rationale

The paper's central claim is a Bayes-consistency theorem for a metric-medoid procedure, conditioned explicitly on topological separability of X and Y plus Y bounded in expectation. The abstract and description present this as a new result at the stated level of generality, with the proof technique (semi-stable compression) introduced as potentially independent. No equations or steps in the provided material reduce a prediction or uniqueness claim to a fitted parameter or self-citation by construction. The result is externally falsifiable via the stated assumptions and does not rely on load-bearing self-citations or ansatzes smuggled from prior work. This is the normal case of a self-contained theoretical derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Result rests on domain assumptions of topological separability for X and Y plus bounded expectation for Y; no free parameters or invented entities are mentioned.

axioms (2)
  • domain assumption X and Y are topologically separable
    Stated as necessary for the strong Bayes-consistency result.
  • domain assumption Y is bounded in expectation
    Explicitly required; described as the authors' term.

pith-pipeline@v0.9.0 · 5644 in / 1169 out tokens · 30711 ms · 2026-05-24T12:33:14.509191+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

  1. [1]

    Ashlagi, L

    Y. Ashlagi, L. Gottlieb, and A. Kontorovich. Functions with average smoothness: structure, algorithms, and learning. In M. Belkin and S. Kpotufe, editors, Conference on Learning Theory, COLT 2021, 15-19 August 2021, Boulder, Colorado, USA , volume 134 of Proceedings of Machine Learning Research, pages 186--236. PMLR , 2021. URL http://proceedings.mlr.pres...

  2. [2]

    T. Z. Baharav and D. Tse. Ultra fast medoid identification via correlated sequential halving. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch \' e - Buc, E. B. Fox, and R. Garnett, editors, NeurIPS, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/c4de8ced6214345614d33fb0b16a8acd-Abstract.html

  3. [3]

    Bardenet and O.-A

    R. Bardenet and O.-A. Maillard. Concentration inequalities for sampling without replacement . Bernoulli, 21 0 (3): 0 1361 -- 1385, 2015. doi:10.3150/14-BEJ605. URL https://doi.org/10.3150/14-BEJ605

  4. [4]

    Biess, A

    A. Biess, A. Kontorovich, Y. Makarychev, and H. Zaichyk. Regression via kirszbraun extension with applications to imitation learning. CoRR, abs/1905.11930, 2019. URL http://arxiv.org/abs/1905.11930

  5. [5]

    Blanchard

    M. Blanchard. Universal strong and weak online learning with bounded loss, preprint. 2021

  6. [6]

    Blanchard and R

    M. Blanchard and R. Cosson. Universal online learning with bounded loss: Reduction to binary classification. 2021

  7. [7]

    Blanchard, R

    M. Blanchard, R. Cosson, and S. Hanneke. Universal online learning with unbounded losses: Memory is all you need. 2021

  8. [8]

    Bousquet, S

    O. Bousquet, S. Hanneke, S. Moran, and N. Zhivotovskiy. Proper learning, helly number, and an optimal SVM bound. In J. D. Abernethy and S. Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria] , volume 125 of Proceedings of Machine Learning Research, pages 582--609. PMLR , 2020. URL http://proceedings.ml...

  9. [9]

    M. V. der Laan, K. Pollard, and J. Bryan. A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73 0 (8): 0 575--584, 2003. doi:10.1080/0094965031000136012. URL https://doi.org/10.1080/0094965031000136012

  10. [10]

    S. N. Evans and A. Q. Jaffe. Strong laws of large numbers for Fr\'echet means, 2020

  11. [11]

    Ferraty, A

    F. Ferraty, A. Laksaci, A. Tadj, and P. Vieu. Kernel regression with functional response . Electronic Journal of Statistics, 5 0 (none): 0 159 -- 171, 2011. doi:10.1214/11-EJS600. URL https://doi.org/10.1214/11-EJS600

  12. [12]

    Fraigniaud, E

    P. Fraigniaud, E. Lebhar, and L. Viennot. The inframetric model for the internet. In IEEE INFOCOM 2008 - The 27th Conference on Computer Communications, pages 1085--1093, 2008. doi:10.1109/INFOCOM.2008.163

  13. [13]

    Gottlieb, A

    L. Gottlieb, A. Kontorovich, and R. Krauthgamer. Efficient classification for metric data (extended abstract COLT 2010). IEEE Transactions on Information Theory , 60 0 (9): 0 5750--5759, 2014. doi:10.1109/TIT.2014.2339840. URL http://dx.doi.org/10.1109/TIT.2014.2339840

  14. [14]

    Gottlieb, A

    L.-A. Gottlieb, A. Kontorovich, and R. Krauthgamer. Adaptive metric dimensionality reduction (extended abstract: ALT 2013). Theoretical Computer Science, pages 105--118, 2016

  15. [15]

    Gottlieb, A

    L.-A. Gottlieb, A. Kontorovich, and R. Krauthgamer. Efficient regression in metric spaces via approximate L ipschitz extension. IEEE Transactions on Information Theory , 63 0 (8): 0 4838--4849, 2017

  16. [16]

    Gy \"o rfi and R

    L. Gy \"o rfi and R. Weiss. Universal consistency and rates of convergence of multiclass prototype algorithms in metric spaces. Journal of Machine Learning Research, 22 0 (151): 0 1--25, 2021. URL http://jmlr.org/papers/v22/20-1081.html

  17. [17]

    Gy\" o rfi, M

    L. Gy\" o rfi, M. Kohler, A. Krzy\. z ak, and H. Walk. A Distribution-Free Theory of Nonparametric Regression. Springer-Verlag New York, Inc., 2002

  18. [18]

    S. Hanneke. Universally consistent online learning with arbitrarily dependent responses, preprint. 2021 a

  19. [19]

    S. Hanneke. Learning whenever learning is possible: Universal learning under general stochastic processes. Journal of Machine Learning Research, 22 0 (130): 0 1--116, 2021 b . URL http://jmlr.org/papers/v22/17-298.html

  20. [20]

    Hanneke and A

    S. Hanneke and A. Kontorovich. Stable sample compression schemes: New applications and an optimal SVM margin bound. In V. Feldman, K. Ligett, and S. Sabato, editors, Algorithmic Learning Theory, 16-19 March 2021, Virtual Conference, Worldwide, volume 132 of Proceedings of Machine Learning Research, pages 697--721. PMLR , 2021. URL http://proceedings.mlr.p...

  21. [21]

    Hanneke, A

    S. Hanneke, A. Kontorovich, S. Sabato, and R. Weiss. Universal Bayes consistency in metric spaces . The Annals of Statistics, 49 0 (4): 0 2129 -- 2150, 2021. doi:10.1214/20-AOS2029. URL https://doi.org/10.1214/20-AOS2029

  22. [22]

    Kontorovich and R

    A. Kontorovich and R. Weiss. A B ayes consistent 1- NN classifier. In Artificial Intelligence and Statistics (AISTATS 2015), 2014

  23. [23]

    Kpotufe and N

    S. Kpotufe and N. Verma. Time-accuracy tradeoffs in kernel prediction: Controlling prediction quality. J. Mach. Learn. Res., 18: 0 44:1--44:29, 2017. URL http://jmlr.org/papers/v18/16-538.html

  24. [24]

    Maurer and M

    A. Maurer and M. Pontil. Empirical bernstein bounds and sample variance penalization, 2009

  25. [25]

    Morvant, S

    E. Morvant, S. Ko c o, and L. Ralaivola. PAC -bayesian generalization bound on confusion matrix for multi-class classification. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012 . icml.cc / Omnipress, 2012. URL http://icml.cc/2012/papers/434.pdf

  26. [26]

    A. Naor. Metric embeddings and lipschitz extensions, 2015

  27. [27]

    Naor and S

    A. Naor and S. Sheffield. Absolutely minimal lipschitz extension of tree-valued mappings. Mathematische Annalen, 354 0 (3): 0 1049--1078, Nov. 2012. ISSN 0025-5831. doi:10.1007/s00208-011-0753-1

  28. [28]

    Newling and F

    J. Newling and F. Fleuret. A sub-quadratic exact medoid algorithm. In A. Singh and X. J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA , volume 54 of Proceedings of Machine Learning Research, pages 185--193. PMLR , 2017. URL http://proceedi...

  29. [29]

    D. Pollard. A user's guide to measure theoretic probability. Cambridge University Press, 2002

  30. [30]

    M. J. Schervish. Theory of statistics. Springer Series in Statistics. Springer-Verlag, New York, 1995. ISBN 0-387-94546-6. doi:10.1007/978-1-4612-4250-5. URL https://doi.org/10.1007/978-1-4612-4250-5

  31. [31]

    Sch\"otz

    C. Sch\"otz. Strong laws of large numbers for generalizations of Fr\'echet mean sets, 2021