Null Measurability at the Symmetrization Interface in VC Learning

Dhruv Gupta

arxiv: 2604.25028 · v1 · submitted 2026-04-27 · 💻 cs.LG · cs.LO· stat.ML

Null Measurability at the Symmetrization Interface in VC Learning

Dhruv Gupta This is my paper

Pith reviewed 2026-05-08 03:53 UTC · model grok-4.3

classification 💻 cs.LG cs.LOstat.ML

keywords VC dimensionmeasurabilityanalytic setssymmetrizationPAC learningghost empirical errorChoquet capacitabilitydescriptive set theory

0 comments

The pith

For Borel-parameterized concept classes on Polish domains the one-sided ghost-gap event is analytic and measurable in every Borel measure completion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recent work required Borel measurability of ghost-gap suprema for the fundamental theorem of statistical learning. This paper shows that for the specific bad event used in the standard symmetrization proof, analyticity suffices. For Borel-parameterized classes on Polish spaces, this event is analytic and thus measurable with respect to the completion of any finite Borel measure. The authors construct a class where the event is null-measurable but not Borel, establishing a strict separation. They also prove that this level of regularity is preserved under several natural operations on concept classes, and in the realizable case this relaxes the hypothesis for deriving PAC learnability from finite VC dimension.

Core claim

For any Borel-parameterized concept class on a Polish domain, the bad event 'there exists a hypothesis whose ghost empirical error exceeds its training empirical error by at least ε/2' is analytic. By Choquet capacitability, it is therefore measurable in the completion of every finite Borel measure. We then construct a concept class whose bad event is null-measurable but not Borel, giving a strict separation from the Borel supremum condition.

What carries the argument

The analyticity of the one-sided ghost-gap bad event, which by Choquet capacitability guarantees measurability in the completion of any finite Borel measure.

If this is right

The symmetrization argument yields PAC learnability from finite VC dimension under this weaker measurability condition in the realizable setting.
The analytic/null-measurable level is closed under patching, fixed and countable interpolation, and fiber-product amalgamation.
There exists a strict separation: some concept classes have null-measurable ghost-gap events that are not Borel.
The results apply directly to the one-sided interface used by the standard symmetrization proof rather than to two-sided suprema.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same analyticity argument may relax measurability assumptions in other uniform-convergence proofs that rely on symmetrization.
The closure properties suggest that many naturally occurring classes built from simpler ones will inherit the weaker regularity.
The Lean 4 formalization indicates the descriptive-set-theoretic steps can be verified mechanically for related theorems.

Load-bearing premise

The domain is Polish and the concept class is Borel-parameterized.

What would settle it

A Borel-parameterized concept class on a Polish domain whose ghost-gap bad event fails to be measurable in the completion of some finite Borel measure.

read the original abstract

Recent work revisiting measurability in the fundamental theorem of statistical learning imposes Borel measurability of ghost-gap suprema. We show that, at the one-sided ghost-gap interface actually used by the standard symmetrization proof, this requirement is stronger than necessary. For any Borel-parameterized concept class on a Polish domain, the bad event "there exists a hypothesis whose ghost empirical error exceeds its training empirical error by at least {\epsilon}/2" is analytic. By Choquet capacitability, it is therefore measurable in the completion of every finite Borel measure. We then construct a concept class whose bad event is null-measurable but not Borel, giving a strict separation from the Borel supremum condition. Finally, we prove closure under patching, fixed and countable interpolation, and fiber-product amalgamation, showing that the weaker regularity level is stable under natural concept-class constructors. In the realizable setting, where targets belong to the class and are measurable, these results weaken the measurability hypothesis needed by the symmetrization route from finite VC dimension to PAC learnability. The main results and the descriptive-set-theoretic infrastructure used by them are formalized in Lean 4.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows analytic measurability is enough for the symmetrization bad event in VC theory on Polish domains, with a clean separation from Borel and Lean-backed closure properties.

read the letter

The core point here is that the symmetrization argument only needs the one-sided ghost-gap bad event to be analytic rather than Borel. For any Borel-parameterized class on a Polish space, that event is analytic, so Choquet capacitability makes it measurable in the completion of every finite Borel measure. They then give an explicit construction of a class where the event is null-measurable but not Borel, and they prove the property is preserved under patching, fixed and countable interpolation, and fiber-product amalgamation. The whole thing, including the descriptive-set-theory setup, is formalized in Lean 4.

Referee Report

0 major / 3 minor

Summary. The paper argues that the one-sided ghost-gap bad event arising in the standard symmetrization argument—i.e., the existence of a hypothesis whose ghost empirical error exceeds its training empirical error by at least ε/2—is an analytic set whenever the concept class is Borel-parameterized on a Polish domain. Choquet capacitability then yields measurability in the completion of every finite Borel measure. The authors supply an explicit separating construction of a class whose bad event is null-measurable but not Borel, establish closure of this regularity level under patching, fixed/countable interpolation and fiber-product amalgamation, and conclude that the symmetrization route from finite VC dimension to PAC learnability requires only this weaker condition in the realizable setting. All main results and the underlying descriptive-set-theoretic infrastructure are formalized in Lean 4.

Significance. If the claims hold, the work sharpens the measurability hypotheses in the fundamental theorem of statistical learning by showing that analyticity (rather than Borel measurability of the supremum) suffices at the symmetrization interface actually used by the proof. The explicit separation example demonstrates that the two regularity notions are strictly distinct, while the closure results indicate stability under natural class constructors. A notable strength is the Lean 4 formalization of both the analyticity argument and the supporting descriptive-set-theory lemmas, supplying machine-checked verification that increases in the technical correctness.

minor comments (3)

[Abstract] Abstract: the list of closure operations (patching, fixed and countable interpolation, fiber-product amalgamation) is given without even a one-sentence gloss; a parenthetical reference to the section where each is defined would improve immediate readability.
[Formalization section] §2 (or wherever the Lean formalization is described): a short table or enumerated list mapping the formalized theorems to the corresponding numbered statements in the paper would help readers who do not inspect the repository.
[Introduction / Preliminaries] Notation for the ghost empirical error and the one-sided bad event is introduced gradually; an early, self-contained display equation collecting the definitions would make the analyticity claim easier to track.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the paper, recognition of its significance in relaxing measurability requirements at the symmetrization interface, and the recommendation for minor revision. We are particularly pleased that the Lean 4 formalization is highlighted as increasing technical confidence. As the report lists no specific major comments under the MAJOR COMMENTS section, we have no point-by-point responses to major comments.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives analyticity of the one-sided ghost-gap bad event directly from the input assumption of Borel parameterization on a Polish domain, using standard descriptive-set-theoretic closure properties rather than any redefinition or self-referential construction. Choquet capacitability is invoked as an external theorem to obtain measurability in measure completions. The separation example is an explicit construction of a concept class, and closure under patching, interpolation, and amalgamation is proven by direct arguments. The Lean 4 formalization supplies independent machine-checked verification of the infrastructure and main claims. No steps reduce by construction to fitted parameters, self-citations, or ansatzes; the chain relies on external mathematical facts and formalization.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from descriptive set theory and topology; no free parameters or invented entities are introduced.

axioms (2)

domain assumption The domain is a Polish space
Required for the bad event to be analytic.
domain assumption The concept class is Borel-parameterized
Assumption on how the hypothesis class is indexed.

pith-pipeline@v0.9.0 · 5497 in / 1545 out tokens · 75335 ms · 2026-05-08T03:53:52.094270+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Blumer, A

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the Vapnik– Chervonenkis dimension.Journal of the ACM, 36(4):929–965, 1989

work page 1989
[2]

Bousquet, S

O. Bousquet, S. Hanneke, S. Moran, R. van Handel, and A. Yehudayoff. A theory of universal learning. InProceedings of the 53rd Annual ACM Symposium on Theory of Computing, pages 532–541, 2021

work page 2021
[3]

G. Choquet. Theory of capacities.Annales de l’Institut Fourier, 5:131–295, 1954

work page 1954
[4]

de Moura and S

L. de Moura and S. Ullrich. The Lean 4 theorem prover and programming language. In Automated Deduction – CADE 28, pages 625–635, 2021

work page 2021
[5]

R. M. Dudley. A course on empirical processes. InÉcole d’Été de Probabilités de Saint-Flour XII–1982, volume 1097 ofLecture Notes in Mathematics, pages 1–142. Springer, 1984

work page 1982
[6]

R. M. Dudley and R. Durst. Empirical processes, Vapnik–Chervonenkis classes and Poisson processes.Probability and Mathematical Statistics, 1(2):109–115, 1981

work page 1981
[7]

Formal Learning Theory Kernel: a Lean 4 formalization of the fundamental theorem of statistical learning

Dhruv Gupta. Formal Learning Theory Kernel: a Lean 4 formalization of the fundamental theorem of statistical learning. Version v3.3.0-paper, commite48c0f6, 2026. 11

work page 2026
[8]

A. S. Kechris.Classical Descriptive Set Theory. Springer, 1995

work page 1995
[9]

L. S. Krapp, M. Vermeil, and L. Wirth. On tameness, measurability and the independence property, 2025. arXiv:2506.08733

work page arXiv 2025
[10]

L. S. Krapp and L. Wirth. Measurability in the fundamental theorem of statistical learning,

work page
[11]

Comparator: a trustworthy judge for Lean proofs

Lean Prover Community. Comparator: a trustworthy judge for Lean proofs. Software, 2025

work page 2025
[12]

Moran and A

S. Moran and A. Yehudayoff. Sample compression schemes for VC classes.Journal of the ACM, 63(3):21:1–21:21, 2016

work page 2016
[13]

V. Pestov. PAC learnability versus VC dimension: a footnote to a basic result of statistical learning. InInternational Joint Conference on Neural Networks, pages 1141–1145, 2011. Also arXiv:1104.2097

work page arXiv 2011
[14]

Pollard.Convergence of Stochastic Processes

D. Pollard.Convergence of Stochastic Processes. Springer Series in Statistics. Springer, 1984

work page 1984
[15]

N. Sauer. On the density of families of sets.Journal of Combinatorial Theory, Series A, 13(1):145–147, 1972

work page 1972
[16]

Shalev-Shwartz and S

S. Shalev-Shwartz and S. Ben-David.Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014

work page 2014
[17]

S. Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages.Pacific Journal of Mathematics, 41(1):247–261, 1972

work page 1972
[18]

M. Suslin. Sur une définition des ensembles mesurables B sans nombres transfinis.Comptes Rendus de l’Académie des Sciences, 164:88–91, 1917

work page 1917
[19]

A. W. van der Vaart and J. A. Wellner.Weak Convergence and Empirical Processes. Springer, 1996

work page 1996
[20]

A. W. van der Vaart and J. A. Wellner. Preservation theorems for Glivenko–Cantelli and uniform Glivenko–Cantelli classes. In E. Giné, D. M. Mason, and J. A. Wellner, editors,High Dimensional Probability II, volume 47 ofProgress in Probability, pages 115–133. Birkhäuser, 2000

work page 2000
[21]

V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities.Theory of Probability and its Applications, 16(2):264–280, 1971. 12

work page 1971

[1] [1]

Blumer, A

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the Vapnik– Chervonenkis dimension.Journal of the ACM, 36(4):929–965, 1989

work page 1989

[2] [2]

Bousquet, S

O. Bousquet, S. Hanneke, S. Moran, R. van Handel, and A. Yehudayoff. A theory of universal learning. InProceedings of the 53rd Annual ACM Symposium on Theory of Computing, pages 532–541, 2021

work page 2021

[3] [3]

G. Choquet. Theory of capacities.Annales de l’Institut Fourier, 5:131–295, 1954

work page 1954

[4] [4]

de Moura and S

L. de Moura and S. Ullrich. The Lean 4 theorem prover and programming language. In Automated Deduction – CADE 28, pages 625–635, 2021

work page 2021

[5] [5]

R. M. Dudley. A course on empirical processes. InÉcole d’Été de Probabilités de Saint-Flour XII–1982, volume 1097 ofLecture Notes in Mathematics, pages 1–142. Springer, 1984

work page 1982

[6] [6]

R. M. Dudley and R. Durst. Empirical processes, Vapnik–Chervonenkis classes and Poisson processes.Probability and Mathematical Statistics, 1(2):109–115, 1981

work page 1981

[7] [7]

Formal Learning Theory Kernel: a Lean 4 formalization of the fundamental theorem of statistical learning

Dhruv Gupta. Formal Learning Theory Kernel: a Lean 4 formalization of the fundamental theorem of statistical learning. Version v3.3.0-paper, commite48c0f6, 2026. 11

work page 2026

[8] [8]

A. S. Kechris.Classical Descriptive Set Theory. Springer, 1995

work page 1995

[9] [9]

L. S. Krapp, M. Vermeil, and L. Wirth. On tameness, measurability and the independence property, 2025. arXiv:2506.08733

work page arXiv 2025

[10] [10]

L. S. Krapp and L. Wirth. Measurability in the fundamental theorem of statistical learning,

work page

[11] [11]

Comparator: a trustworthy judge for Lean proofs

Lean Prover Community. Comparator: a trustworthy judge for Lean proofs. Software, 2025

work page 2025

[12] [12]

Moran and A

S. Moran and A. Yehudayoff. Sample compression schemes for VC classes.Journal of the ACM, 63(3):21:1–21:21, 2016

work page 2016

[13] [13]

V. Pestov. PAC learnability versus VC dimension: a footnote to a basic result of statistical learning. InInternational Joint Conference on Neural Networks, pages 1141–1145, 2011. Also arXiv:1104.2097

work page arXiv 2011

[14] [14]

Pollard.Convergence of Stochastic Processes

D. Pollard.Convergence of Stochastic Processes. Springer Series in Statistics. Springer, 1984

work page 1984

[15] [15]

N. Sauer. On the density of families of sets.Journal of Combinatorial Theory, Series A, 13(1):145–147, 1972

work page 1972

[16] [16]

Shalev-Shwartz and S

S. Shalev-Shwartz and S. Ben-David.Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014

work page 2014

[17] [17]

S. Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages.Pacific Journal of Mathematics, 41(1):247–261, 1972

work page 1972

[18] [18]

M. Suslin. Sur une définition des ensembles mesurables B sans nombres transfinis.Comptes Rendus de l’Académie des Sciences, 164:88–91, 1917

work page 1917

[19] [19]

A. W. van der Vaart and J. A. Wellner.Weak Convergence and Empirical Processes. Springer, 1996

work page 1996

[20] [20]

A. W. van der Vaart and J. A. Wellner. Preservation theorems for Glivenko–Cantelli and uniform Glivenko–Cantelli classes. In E. Giné, D. M. Mason, and J. A. Wellner, editors,High Dimensional Probability II, volume 47 ofProgress in Probability, pages 115–133. Birkhäuser, 2000

work page 2000

[21] [21]

V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities.Theory of Probability and its Applications, 16(2):264–280, 1971. 12

work page 1971