On rates of convergence for sample average approximations without smoothness
Pith reviewed 2026-05-07 16:40 UTC · model grok-4.3
The pith
Uniform control of the empirical objective process transfers directly to convergence rates for optimal values and empirical minimizers without continuity or smoothness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Uniform control of the empirical objective process on ℓ^∞(X) transfers deterministically to convergence rates for optimal values, excess risks of empirical ε-minimizers, and, under a sharp-growth condition, distances to the expected objective solution set; combined with directional differentiability of the infimum functional this yields weak limits at the n^{-1/2} scale.
What carries the argument
The deterministic transfer from uniform deviation bounds on the centered empirical process to the infimum and argmin functionals, formalized via the Hoffmann-Jørgensen outer-probability calculus.
If this is right
- Optimal values inherit the same convergence rate as the uniform deviation of the empirical process.
- Excess risk of any empirical ε-minimizer is bounded by a multiple of the uniform deviation.
- Under sharp growth the distance from an empirical minimizer to the true solution set converges at the same rate.
- Directional differentiability of the infimum yields n^{-1/2} weak convergence of the empirical optimal value.
- LILs and maximal inequalities upgrade the rates to outer almost-sure and outer-mean convergence.
Where Pith is reading between the lines
- The same transfer could be used to obtain rates for empirical risk minimization with non-differentiable surrogate losses.
- Verification of the VC-subgraph and definability conditions on fixed-architecture networks opens the door to rate statements for deep learning without smoothness.
- The framework suggests that tame geometry rather than smoothness is the essential property needed for rate theory in stochastic optimization.
- Analogous results may hold for other variational problems whose objective map is directionally differentiable.
Load-bearing premise
The objective function class satisfies definability, admits an integrable envelope, and forms a VC-subgraph class, with an additional sharp-growth condition needed for distance-to-set rates.
What would settle it
An explicit function class where the empirical process converges uniformly at a known rate yet the optimal value or distance to the solution set converges slower than the predicted rate.
read the original abstract
Sample average approximation (SAA) replaces an intractable expected objective by an empirical average and is a basic device of modern stochastic optimization. We develop a rate theory for optimal values and empirical $\varepsilon$-minimizers that does not assume continuity, lower semicontinuity, or smooth perturbation structure of the sample objectives. Working on $\ell^{\infty}(X)$ with the Hoffmann--J{\o}rgensen outer-probability formalism, we show that uniform control of the empirical objective process transfers deterministically to convergence rates for optimal values, excess risks of empirical $\varepsilon$-minimizers, and, under a sharp-growth condition, distances to the expected objective solution set. Combined with the directional differentiability of the infimum functional, this yields weak limits for empirical optimal values at the $n^{-1/2}$ scale. Combined with LILs and maximal inequalities, it yields outer almost-sure and outer-mean rates. The definability, envelope, and VC-subgraph hypotheses are verified for definable discontinuous or non-Lipschitz classes arising in direct $0$--$1$ classification, fixed-architecture neural networks, threshold regression, and non-Lipschitz $\ell_{p}$-type objectives with rational $0<p<1$. Practical sufficient conditions for measurability hypotheses are discussed. Together, the framework extends continuity-based SAA theory to a tame-topological setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a rate theory for sample average approximations (SAA) that dispenses with continuity, lower semicontinuity, and smoothness assumptions on the sample objectives. Working in ℓ^∞(X) under the Hoffmann-Jørgensen outer-probability formalism, it shows that uniform control of the empirical objective process transfers deterministically via elementary inf and ε-minimality inequalities to rates on optimal values, excess risks of empirical ε-minimizers, and (under a sharp-growth condition) distances to the expected solution set. Directional differentiability of the infimum functional then yields n^{-1/2} weak limits for empirical optimal values, while LILs and maximal inequalities give outer a.s. and mean rates. The required definability, envelope, and VC-subgraph conditions are verified explicitly for discontinuous or non-Lipschitz classes arising in 0-1 classification, fixed-architecture neural networks, threshold regression, and rational ℓ_p objectives (0<p<1), with practical measurability conditions discussed.
Significance. If the central claims hold, the work meaningfully extends SAA convergence theory beyond the smooth/continuous regime to settings that arise routinely in modern stochastic optimization and machine learning. The deterministic transfer step is a clean, assumption-light contribution that relies only on the definitions of inf and ε-minimality. Explicit verification of the empirical-process hypotheses for the listed non-Lipschitz applications, together with the treatment of measurability via outer probabilities, makes the results directly applicable rather than purely abstract. The combination with directional differentiability for weak limits and with LILs for a.s. rates provides a unified framework that previous continuity-based SAA analyses do not cover.
minor comments (3)
- [§4] The sharp-growth condition (used for distance-to-solution-set rates) is introduced without a concrete numerical example or a simple sufficient condition that practitioners could check; adding one in §4 or §5 would improve usability.
- [§6] In the neural-network and threshold-regression applications, the precise envelope and VC-subgraph arguments are stated to hold, but the main text would benefit from a one-paragraph summary of the key definability properties rather than relegating all details to an appendix.
- [§3] Notation for the outer-probability statements (e.g., the distinction between P* and P_*) is introduced early but used inconsistently in the weak-limit section; a short notational table or consistent use of a single symbol would reduce reader effort.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive summary, and recommendation of minor revision. We appreciate the recognition that the deterministic transfer step and explicit verifications for non-Lipschitz applications meaningfully extend SAA theory. No specific major comments were raised, so we have no point-by-point rebuttals to provide. We will address any minor editorial or presentational suggestions in the revised manuscript.
Circularity Check
No significant circularity
full rationale
The derivation chain begins with uniform control of the empirical objective process in the Hoffmann-Jørgensen outer-probability sense on ℓ^∞(X), which transfers deterministically to rates on optimal values and excess risks via elementary inequalities that invoke only the definitions of inf and ε-minimality; these hold for arbitrary functions without continuity assumptions. Probabilistic rates then follow from the paper's explicit verification of envelope, definability, and VC-subgraph conditions for the target discontinuous classes, combined with standard external tools (LILs, maximal inequalities). The n^{-1/2} weak-limit result is obtained by composing the uniform convergence with directional differentiability of the infimum map, which is stated as an explicit assumption rather than derived internally. No step reduces by construction to a fitted parameter, self-definition, or self-citation chain; all load-bearing elements rest on independent external results or direct verification.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption Definability, envelope, and VC-subgraph hypotheses hold for the target discontinuous or non-Lipschitz classes
- domain assumption Directional differentiability of the infimum functional
- domain assumption Sharp-growth condition
Reference graph
Works this paper leans on
-
[1]
M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009
work page 2009
-
[2]
D. Banholzer, J. Fliege, and R. Werner. On rates of convergence for sample average approximations in the almost sure sense and in mean. Mathematical Programming, 191:307--345, 2022
work page 2022
- [3]
-
[4]
J. Bolte and E. Pauwels. Majorization--minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Mathematics of Operations Research, 41:442--465, 2016
work page 2016
-
[5]
Y. Cui and J.-S. Pang. Modern Nonconvex Nondifferentiable Optimization. SIAM and MOS, 2022
work page 2022
-
[6]
A. Buccini, O. De la Cruz Cabrera, M. Donatelli, A. Martinelli, and L. Reichel. Large-scale regression with non-convex loss and penalty. Applied Numerical Mathematics, 157:590--601, 2020
work page 2020
-
[7]
C. A. Floudas. Deterministic Global Optimization: Theory, Methods and Applications. Kluwer Academic Publishers, 2000
work page 2000
-
[8]
M. Locatelli and F. Schoen. Global Optimization: Theory, Algorithms, and Applications. SIAM and MOS, 2013
work page 2013
-
[9]
A. D. Ioffe. An invitation to tame optimization. SIAM Journal on Optimization, 19:1894--1917, 2009
work page 1917
-
[10]
S. Kim, R. Pasupathy, and S. G. Henderson. A guide to sample average approximation. In M. C. Fu, editor, Handbook of Simulation Optimization, pages 207--243. Springer, 2015
work page 2015
-
[11]
J. Kuelbs. A strong convergence theorem for B anach space valued random variables. Annals of Probability, 4:744--771, 1976
work page 1976
-
[12]
J. Kuelbs. Kolmogorov's law of the iterated logarithm for B anach space valued random variables. Illinois Journal of Mathematics, 21:784--800, 1977
work page 1977
-
[13]
T. Lê Loi. O-minimal structures. In Proceedings of the Centre for Mathematics and Its Applications, volume 43, pages 19--30, 2010
work page 2010
-
[14]
J. L. Montaña and L. M. Pardo. On the V apnik-- C hervonenkis dimension of computer programs which use transcendental elementary operations. Annals of Mathematics and Artificial Intelligence, 56:371--388, 2009
work page 2009
-
[15]
J. Pila. Point-Counting and the Zilber--Pink Conjecture. Cambridge University Press, 2022
work page 2022
-
[16]
A. Shapiro. Asymptotic analysis of stochastic programs. Annals of Operations Research, 30:169--186, 1991
work page 1991
-
[17]
A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory. SIAM, third edition, 2021
work page 2021
-
[18]
B. E. Hansen. Sample splitting and threshold estimation. Econometrica, 68:575--603, 2000
work page 2000
-
[19]
H. L. Koul, L. Qian, and D. Surgailis. Asymptotics of M -estimators in two-phase linear regression models. Stochastic Processes and their Applications, 103:123--154, 2003
work page 2003
-
[20]
L. Li and H.-T. Lin. Optimizing 0/1 loss for perceptrons by random coordinate descent. In Proceedings of the International Joint Conference on Neural Networks, pages 749--754. IEEE, 2007
work page 2007
-
[21]
T. T. Nguyen and S. Sanner. Algorithms for direct 0 -- 1 loss optimization in binary classification. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1085--1093, 2013
work page 2013
-
[22]
L. van den Dries. Tame Topology and O-minimal Structures. Cambridge University Press, 1998
work page 1998
-
[23]
A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, second edition, 2023
work page 2023
-
[24]
J. Westerhout, T. T. Nguyen, X. Guo, and H. D. Nguyen. On the asymptotic distribution of the minimum empirical risk. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 52869--52902, 2024
work page 2024
-
[25]
Y.-F. Ye, C. Ying, Y.-H. Shao, C.-N. Li, and Y.-J. Chen. Robust and sparse LP -norm support vector regression. Journal of Advanced Computational Intelligence and Intelligent Informatics, 21:989--997, 2017
work page 2017
-
[26]
J. E. Yukich. The law of the iterated logarithm for empirical processes. In U. Haagerup, J. Hoffmann-Jørgensen, and M. B. Marcus, editors, Probability in Banach Spaces 6, pages 265--282. Birkhäuser, 1990
work page 1990
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.