Realizable Bayes-Consistency for General Metric Losses

Aryeh Kontorovich; Dan Tsir Cohen; Steve Hanneke

arxiv: 2605.03823 · v2 · submitted 2026-05-05 · 💻 cs.LG · cs.IT· math.IT· math.ST· stat.TH

Realizable Bayes-Consistency for General Metric Losses

Dan Tsir Cohen , Steve Hanneke , Aryeh Kontorovich This is my paper

Pith reviewed 2026-05-15 06:55 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.ITmath.STstat.TH

keywords realizable Bayes-consistencymetric lossesLittlestone treeshypothesis classesdistribution-free learningcombinatorial characterization

0 comments

The pith

A hypothesis class permits strong realizable Bayes-consistency for any metric loss if and only if it contains no infinite non-decreasing (gamma_k)-Littlestone tree with gamma_k tending to infinity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper determines the exact conditions on a hypothesis class H that guarantee the existence of a distribution-free learning rule whose risk converges almost surely to the optimal risk of zero whenever the data is realizable by some member of H. This holds for general metric losses on the label space, which may be unbounded. A sympathetic reader cares because the result supplies a sharp combinatorial test that separates classes where universal consistency is possible from those where no such rule exists, without any distributional assumptions on the data. The characterization extends earlier results for 0-1 classification and real-valued regression by replacing the classical Littlestone dimension with a scaled, non-decreasing tree obstruction.

Core claim

For any hypothesis class H, there exists a distribution-free learning rule whose risk converges almost surely to the best-in-class risk of zero for every realizable data-generating distribution if and only if H does not contain an infinite non-decreasing (gamma_k)-Littlestone tree with gamma_k tending to infinity.

What carries the argument

The infinite non-decreasing (gamma_k)-Littlestone tree, a combinatorial structure that forces the loss to grow unboundedly along some sequence of instances and thereby blocks any consistent learning rule.

If this is right

Consistency is possible precisely when the hypothesis class avoids all infinite non-decreasing (gamma_k)-Littlestone trees.
The result applies to any metric loss, including unbounded ones, on arbitrary label spaces.
The same learning rule works uniformly across all realizable distributions without needing to know the distribution in advance.
The characterization is both necessary and sufficient, resolving the realizable case of the open problem stated in prior work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tree condition supplies a concrete test that can be applied to specific families of hypothesis classes to decide consistency in advance.
The same obstruction may appear in related settings such as online learning or agnostic consistency when losses are metric.
Extending the result to non-metric losses would likely require identifying different combinatorial structures that play the role of the scaled Littlestone tree.

Load-bearing premise

The loss must form a metric on the label space, and the only obstruction to consistency must be exactly the existence of an infinite non-decreasing (gamma_k)-Littlestone tree with gamma_k going to infinity.

What would settle it

Exhibit a concrete hypothesis class that contains an infinite non-decreasing (gamma_k)-Littlestone tree yet still admits a distribution-free rule achieving almost-sure convergence to zero risk on every realizable distribution, or exhibit one without such a tree that fails to do so.

read the original abstract

We study strong universal Bayes-consistency in the realizable setting for learning with general metric losses, extending classical characterizations beyond $0$-$1$ classification (Bousquet et al., 2020; Hanneke et al., 2021) and real-valued regression (Attias et al., 2024). Given an instance space $(X,\rho)$, a label space $(Y,\ell)$ with possibly unbounded loss, and a hypothesis class $H \subseteq Y^{X}$, we resolve the realizable case of an open problem presented in Tsir Cohen and Kontorovich (2022). Specifically, we find the necessary and sufficient conditions on the hypothesis class $H$ under which there exists a distribution-free learning rule whose risk converges almost surely to the best-in-class risk (which is zero) for every realizable data-generating distribution. Our main contribution is this sharp characterization in terms of a combinatorial obstruction: Similarly to Attias et al. (2024), we introduce the notion of an infinite non-decreasing $(\gamma_k)$-Littlestone tree, where $\gamma_k \to \infty$. This extends the Littlestone tree structure used in Bousquet et al. (2020) to the metric loss setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper closes the realizable Bayes-consistency case for general metric losses with a sharp combinatorial condition based on a generalized Littlestone tree.

read the letter

The main takeaway is that Tsir Cohen, Hanneke, and Kontorovich have resolved the open realizable case by giving necessary and sufficient conditions on H: a distribution-free rule exists that drives risk to zero almost surely on every realizable distribution if and only if H contains no infinite non-decreasing (γ_k)-Littlestone tree with γ_k tending to infinity. This directly extends the tree-based obstructions from the 0-1 setting in Bousquet et al. and the regression case in Attias et al. to arbitrary metric losses, possibly unbounded. The new tree definition captures the metric structure by letting the separation parameter grow without bound, which unifies the prior results without adding boundedness or completeness assumptions on the loss. The characterization looks sharp and non-circular, resting on the combinatorial definition rather than fitted parameters or self-referential arguments. The paper handles both directions cleanly enough that the stress-test finds no load-bearing gaps. Minor soft spots include the need for explicit verification in the full proofs that almost-sure convergence holds when losses are unbounded, and perhaps a few more worked examples for discrete versus continuous label spaces to make the tree construction intuitive. The citation pattern is appropriate and builds directly on the cited priors. This work is aimed at learning theorists who care about consistency characterizations and combinatorial dimensions in structured prediction or metric learning. A reader looking for precise obstructions to learnability will find it useful. It deserves peer review because the result is formally grounded, resolves a stated open problem, and supplies evidence in the form of a necessary-and-sufficient criterion that can be checked independently.

Referee Report

2 major / 2 minor

Summary. The paper resolves the realizable case of an open problem on strong universal Bayes-consistency for general metric losses. It gives necessary and sufficient conditions on a hypothesis class H ⊆ Y^X such that there exists a distribution-free learning rule whose risk converges almost surely to the best-in-class risk (zero) for every realizable data-generating distribution. The characterization is expressed via the absence of an infinite non-decreasing (γ_k)-Littlestone tree with γ_k → ∞, extending the combinatorial tools from 0-1 classification and real-valued regression to the metric-loss setting.

Significance. If the central characterization holds, the result is a significant advance: it supplies the first sharp, distribution-free necessary-and-sufficient condition for strong realizable consistency under general (possibly unbounded) metric losses. The introduction of the non-decreasing (γ_k)-Littlestone tree is a clean combinatorial generalization that unifies prior work and directly answers the question posed in Tsir Cohen & Kontorovich (2022). The paper thereby provides both a theoretical closure and a concrete obstruction that can be checked for concrete hypothesis classes.

major comments (2)

[§4, Theorem 4.1] §4, Theorem 4.1 (sufficiency): the construction of the distribution-free rule is only sketched at a high level; an explicit description (or pseudocode) of how the rule uses the tree-free assumption to achieve a.s. convergence would make the argument self-contained and easier to verify.
[§3.2, Definition 3.3] §3.2, Definition 3.3: the necessity direction claims that any infinite non-decreasing (γ_k)-Littlestone tree with γ_k → ∞ blocks every distribution-free rule; the proof relies on the metric property of ℓ, but it is not shown whether the triangle inequality is essential or whether the same obstruction holds for arbitrary losses.

minor comments (2)

[§2] The notation for the instance metric ρ and label metric ℓ is introduced in §2 but reused without re-statement in later sections; a short reminder of the domains would improve readability.
[Figure 1] Figure 1 (illustration of the (γ_k)-tree) does not annotate the successive γ_k values on the branches; adding these labels would clarify the non-decreasing condition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment, the recommendation of minor revision, and the constructive comments. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4, Theorem 4.1] §4, Theorem 4.1 (sufficiency): the construction of the distribution-free rule is only sketched at a high level; an explicit description (or pseudocode) of how the rule uses the tree-free assumption to achieve a.s. convergence would make the argument self-contained and easier to verify.

Authors: We agree that the sufficiency argument in Theorem 4.1 would benefit from greater explicitness. In the revised version we will supply a detailed algorithmic description together with pseudocode that shows precisely how the absence of an infinite non-decreasing (γ_k)-Littlestone tree is used to construct the learning rule and to guarantee almost-sure convergence of the risk to the best-in-class value of zero. revision: yes
Referee: [§3.2, Definition 3.3] §3.2, Definition 3.3: the necessity direction claims that any infinite non-decreasing (γ_k)-Littlestone tree with γ_k → ∞ blocks every distribution-free rule; the proof relies on the metric property of ℓ, but it is not shown whether the triangle inequality is essential or whether the same obstruction holds for arbitrary losses.

Authors: The necessity proof is developed specifically for metric losses and makes essential use of the triangle inequality when showing that an infinite non-decreasing (γ_k)-Littlestone tree prevents every distribution-free rule from achieving strong consistency. We will add a clarifying remark that highlights this dependence on the metric property and notes that the same combinatorial obstruction need not apply to non-metric losses; the result is stated and proved for the metric setting posed in the open problem. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new combinatorial characterization is self-contained

full rationale

The paper defines a novel obstruction—an infinite non-decreasing (γ_k)-Littlestone tree with γ_k → ∞—and proves it is necessary and sufficient for the non-existence of a distribution-free rule achieving a.s. convergence to zero realizable risk under metric losses. Both directions of the characterization are derived directly from this definition via standard combinatorial arguments that extend (but do not presuppose) the 0-1 and regression cases. No load-bearing step reduces to a fitted parameter, self-citation chain, or renaming of a prior result; the 2022 open-problem citation merely states the question being solved, not the solution itself. The derivation is therefore independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the new combinatorial definition of the (γ_k)-Littlestone tree and standard background from probability and metric spaces; no free parameters or invented physical entities are introduced.

axioms (2)

standard math Standard axioms of probability spaces and almost-sure convergence
Invoked when stating that risk converges almost surely to zero for every realizable distribution.
domain assumption The label space (Y, ℓ) forms a metric space, possibly with unbounded metric
Required to extend the loss beyond 0-1 and squared error.

invented entities (1)

infinite non-decreasing (γ_k)-Littlestone tree no independent evidence
purpose: Combinatorial obstruction that exactly characterizes when Bayes-consistency fails
New structure introduced to capture the metric-loss case; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5532 in / 1443 out tokens · 36732 ms · 2026-05-15T06:55:04.735507+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

necessary and sufficient conditions on the hypothesis class H ... infinite non-decreasing (γ_k)-Littlestone tree, where γ_k → ∞
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

realizable strong universal Bayes-consistency ... Gale-Stewart game

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Optimal learners for realizable regression: Pac learning and online learning, 2024 a

Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G. Optimal learners for realizable regression: Pac learning and online learning, 2024 a . URL https://arxiv.org/abs/2307.03848

work page arXiv 2024
[2]

Universal Rates for Regression : Separations between Cut - Off and Absolute Loss

Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G. Universal Rates for Regression : Separations between Cut - Off and Absolute Loss . In Proceedings of Thirty Seventh Conference on Learning Theory , pp.\ 359--405. PMLR, June 2024 b . URL https://proceedings.mlr.press/v247/attias24a.html. ISSN: 2640-3498

work page 2024
[3]

A theory of universal learning, 2020

Bousquet, O., Hanneke, S., Moran, S., van Handel, R., and Yehudayoff, A. A theory of universal learning, 2020. URL https://arxiv.org/abs/2011.04483

work page arXiv 2020
[4]

A Characterization of Multiclass Learnability , March 2022

Brukhim, N., Carmon, D., Dinur, I., Moran, S., and Yehudayoff, A. A Characterization of Multiclass Learnability , March 2022. URL http://arxiv.org/abs/2203.01550. arXiv:2203.01550 [cs]

work page arXiv 2022
[6]

and Krauthgamer, R

Gottlieb, L.-A. and Krauthgamer, R. Proximity algorithms for nearly doubling spaces. SIAM J. Discrete Math., 27 0 (4): 0 1759--1769, 2013

work page 2013
[7]

Universal bayes consistency in metric spaces, 2021

Hanneke, S., Kontorovich, A., Sabato, S., and Weiss, R. Universal bayes consistency in metric spaces, 2021. URL https://arxiv.org/abs/1906.09855

work page arXiv 2021
[8]

Universal Rates for Multiclass Learning , July 2023

Hanneke, S., Moran, S., and Zhang, Q. Universal Rates for Multiclass Learning , July 2023. URL http://arxiv.org/abs/2307.02066. arXiv:2307.02066 [cs]

work page arXiv 2023
[9]

Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm

Littlestone, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2 0 (4): 0 285--318, 1988. doi:10.1007/BF00116827

work page doi:10.1007/bf00116827 1988
[10]

and Kontorovich, A

Tsir Cohen , D. and Kontorovich, A. Learning with metric losses. In Loh, P.-L. and Raginsky, M. (eds.), Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pp.\ 662--700. PMLR, 02--05 Jul 2022. URL https://proceedings.mlr.press/v178/cohen22a.html

work page 2022
[11]

2020 , eprint=

A Theory of Universal Learning , author=. 2020 , eprint=

work page 2020
[12]

Universal Rates for Multiclass Learning , July 2023

Hanneke, Steve and Moran, Shay and Zhang, Qian , month = jul, year =. Universal. doi:10.48550/arXiv.2307.02066 , abstract =

work page doi:10.48550/arxiv.2307.02066
[13]

Universal

Attias, Idan and Hanneke, Steve and Kalavasis, Alkis and Karbasi, Amin and Velegkas, Grigoris , month = jun, year =. Universal. Proceedings of

work page
[14]

Cohen, Alon and Erez, Liad and Hanneke, Steve and Koren, Tomer and Mansour, Yishay and Moran, Shay and Zhang, Qian , month = nov, year =. Sample. doi:10.48550/arXiv.2511.12659 , abstract =

work page doi:10.48550/arxiv.2511.12659
[15]

Brukhim, Nataly and Carmon, Daniel and Dinur, Irit and Moran, Shay and Yehudayoff, Amir , month = mar, year =. A. doi:10.48550/arXiv.2203.01550 , abstract =

work page doi:10.48550/arxiv.2203.01550
[16]

Proceedings of Thirty Fifth Conference on Learning Theory , pages =

Learning with metric losses , author =. Proceedings of Thirty Fifth Conference on Learning Theory , pages =. 2022 , editor =

work page 2022
[17]

2021 , eprint=

Universal Bayes consistency in metric spaces , author=. 2021 , eprint=

work page 2021
[18]

2024 , eprint=

Optimal Learners for Realizable Regression: PAC Learning and Online Learning , author=. 2024 , eprint=

work page 2024
[19]

Lee-Ad Gottlieb and Robert Krauthgamer , title =. SIAM J. Discrete Math. , volume =. 2013 , pages =

work page 2013
[20]

Gottlieb, A

Lee. Efficient Classification for Metric Data (extended abstract. 2014 , volume =. doi:10.1109/TIT.2014.2339840 , timestamp =

work page doi:10.1109/tit.2014.2339840 2014
[21]

Machine Learning , volume =

Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm , author =. Machine Learning , volume =. 1988 , publisher =

work page 1988

[1] [1]

Optimal learners for realizable regression: Pac learning and online learning, 2024 a

Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G. Optimal learners for realizable regression: Pac learning and online learning, 2024 a . URL https://arxiv.org/abs/2307.03848

work page arXiv 2024

[2] [2]

Universal Rates for Regression : Separations between Cut - Off and Absolute Loss

Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G. Universal Rates for Regression : Separations between Cut - Off and Absolute Loss . In Proceedings of Thirty Seventh Conference on Learning Theory , pp.\ 359--405. PMLR, June 2024 b . URL https://proceedings.mlr.press/v247/attias24a.html. ISSN: 2640-3498

work page 2024

[3] [3]

A theory of universal learning, 2020

Bousquet, O., Hanneke, S., Moran, S., van Handel, R., and Yehudayoff, A. A theory of universal learning, 2020. URL https://arxiv.org/abs/2011.04483

work page arXiv 2020

[4] [4]

A Characterization of Multiclass Learnability , March 2022

Brukhim, N., Carmon, D., Dinur, I., Moran, S., and Yehudayoff, A. A Characterization of Multiclass Learnability , March 2022. URL http://arxiv.org/abs/2203.01550. arXiv:2203.01550 [cs]

work page arXiv 2022

[5] [6]

and Krauthgamer, R

Gottlieb, L.-A. and Krauthgamer, R. Proximity algorithms for nearly doubling spaces. SIAM J. Discrete Math., 27 0 (4): 0 1759--1769, 2013

work page 2013

[6] [7]

Universal bayes consistency in metric spaces, 2021

Hanneke, S., Kontorovich, A., Sabato, S., and Weiss, R. Universal bayes consistency in metric spaces, 2021. URL https://arxiv.org/abs/1906.09855

work page arXiv 2021

[7] [8]

Universal Rates for Multiclass Learning , July 2023

Hanneke, S., Moran, S., and Zhang, Q. Universal Rates for Multiclass Learning , July 2023. URL http://arxiv.org/abs/2307.02066. arXiv:2307.02066 [cs]

work page arXiv 2023

[8] [9]

Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm

Littlestone, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2 0 (4): 0 285--318, 1988. doi:10.1007/BF00116827

work page doi:10.1007/bf00116827 1988

[9] [10]

and Kontorovich, A

Tsir Cohen , D. and Kontorovich, A. Learning with metric losses. In Loh, P.-L. and Raginsky, M. (eds.), Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pp.\ 662--700. PMLR, 02--05 Jul 2022. URL https://proceedings.mlr.press/v178/cohen22a.html

work page 2022

[10] [11]

2020 , eprint=

A Theory of Universal Learning , author=. 2020 , eprint=

work page 2020

[11] [12]

Universal Rates for Multiclass Learning , July 2023

Hanneke, Steve and Moran, Shay and Zhang, Qian , month = jul, year =. Universal. doi:10.48550/arXiv.2307.02066 , abstract =

work page doi:10.48550/arxiv.2307.02066

[12] [13]

Universal

Attias, Idan and Hanneke, Steve and Kalavasis, Alkis and Karbasi, Amin and Velegkas, Grigoris , month = jun, year =. Universal. Proceedings of

work page

[13] [14]

Cohen, Alon and Erez, Liad and Hanneke, Steve and Koren, Tomer and Mansour, Yishay and Moran, Shay and Zhang, Qian , month = nov, year =. Sample. doi:10.48550/arXiv.2511.12659 , abstract =

work page doi:10.48550/arxiv.2511.12659

[14] [15]

Brukhim, Nataly and Carmon, Daniel and Dinur, Irit and Moran, Shay and Yehudayoff, Amir , month = mar, year =. A. doi:10.48550/arXiv.2203.01550 , abstract =

work page doi:10.48550/arxiv.2203.01550

[15] [16]

Proceedings of Thirty Fifth Conference on Learning Theory , pages =

Learning with metric losses , author =. Proceedings of Thirty Fifth Conference on Learning Theory , pages =. 2022 , editor =

work page 2022

[16] [17]

2021 , eprint=

Universal Bayes consistency in metric spaces , author=. 2021 , eprint=

work page 2021

[17] [18]

2024 , eprint=

Optimal Learners for Realizable Regression: PAC Learning and Online Learning , author=. 2024 , eprint=

work page 2024

[18] [19]

Lee-Ad Gottlieb and Robert Krauthgamer , title =. SIAM J. Discrete Math. , volume =. 2013 , pages =

work page 2013

[19] [20]

Gottlieb, A

Lee. Efficient Classification for Metric Data (extended abstract. 2014 , volume =. doi:10.1109/TIT.2014.2339840 , timestamp =

work page doi:10.1109/tit.2014.2339840 2014

[20] [21]

Machine Learning , volume =

Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm , author =. Machine Learning , volume =. 1988 , publisher =

work page 1988