AI4SLT: Empirical Processes in Lean 4 for Formal Statistical Learning Theory
read the original abstract
We present the first comprehensive Lean 4 formalization of statistical learning theory (SLT) grounded in empirical process theory. Our en-to-end formal infrastructure implement the missing contents in latest Lean library, including a complete development of Gaussian Lipschitz concentration, Dudley's entropy integral theorem for sub-Gaussian processes, and an application to least-squares (sparse) regression with a sharp rate. The project was carried out using a human-AI collaborative workflow, in which humans design proof strategies and AI agents execute tactical proof construction, leading to the human-verified Lean 4 toolbox for SLT. Beyond implementation, the formalization process exposes and resolves implicit assumptions and missing details in standard SLT textbooks, enforcing a granular, line-by-line understanding of the theory. This work establishes a reusable formal foundation and opens the door for future developments in machine learning theory. The code is provided in https://github.com/YuanheZ/lean-stat-learning-theory.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity
Proves an impossibility theorem that no feature attribution ranking can be faithful, stable, and complete under collinearity, characterizes the design space as two families, introduces the DASH ensemble method, and fo...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.