pith. sign in

arxiv: 2503.19605 · v5 · pith:RFV4GZTOnew · submitted 2025-03-25 · 💻 cs.LG · cs.CL· math.ST· stat.TH

Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral

classification 💻 cs.LG cs.CLmath.STstat.TH
keywords complexityrademachererrorboundboundsgeneralizationtheorycountable
0
0 comments X
read the original abstract

Understanding and certifying the generalization performance of machine learning algorithms -- i.e. obtaining theoretical estimates of the test error from the training error -- is a central theme of statistical learning theory. Among the many complexity measures used to derive such guarantees, Rademacher complexity yields sharp, data-dependent bounds that apply well beyond classical VC-dimension theory. In this study, we formalize the generalization error bound by Rademacher complexity in Lean 4, building on measure-theoretic probability theory available in the Mathlib library. Our development provides a mechanically-checked pipeline from the definitions of empirical and expected Rademacher complexity, through a formal symmetrization argument and a bounded-differences analysis, to high-probability uniform deviation bounds via a formally proved McDiarmid inequality. A key technical contribution is a reusable mechanism for lifting results from countable hypothesis classes (where measurability of suprema is straightforward in Mathlib) to separable topological index sets via a reduction to a countable dense subset. As worked applications of the abstract theorem, we mechanize standard empirical Rademacher bounds for linear predictors under $\ell_2$ and $\ell_1$ regularizations, and we also formalize a Dudley-type entropy integral bound based on covering numbers and a chaining construction.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.