pith. sign in

arxiv: 2604.25153 · v1 · submitted 2026-04-28 · 🧮 math.OC

On rates of convergence for sample average approximations without smoothness

Pith reviewed 2026-05-07 16:40 UTC · model grok-4.3

classification 🧮 math.OC
keywords sample average approximationstochastic optimizationrates of convergenceempirical processesnon-Lipschitz functionsVC-subgraph classesneural networks0-1 classification
0
0 comments X

The pith

Uniform control of the empirical objective process transfers directly to convergence rates for optimal values and empirical minimizers without continuity or smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops rates of convergence for optimal values, excess risks of ε-minimizers, and distances to solution sets in sample average approximation. It shows that any uniform bound on how empirical objectives approximate the expected objective immediately yields explicit rates for these quantities, even when the objectives are discontinuous or fail to be Lipschitz. The transfer holds in a general setting using outer probability and applies after verifying definability, envelope, and VC-subgraph conditions on the function class. When the infimum functional is directionally differentiable, the same control produces weak convergence of the optimal value at the sqrt(n) scale; laws of the iterated logarithm then give almost-sure rates.

Core claim

Uniform control of the empirical objective process on ℓ^∞(X) transfers deterministically to convergence rates for optimal values, excess risks of empirical ε-minimizers, and, under a sharp-growth condition, distances to the expected objective solution set; combined with directional differentiability of the infimum functional this yields weak limits at the n^{-1/2} scale.

What carries the argument

The deterministic transfer from uniform deviation bounds on the centered empirical process to the infimum and argmin functionals, formalized via the Hoffmann-Jørgensen outer-probability calculus.

If this is right

  • Optimal values inherit the same convergence rate as the uniform deviation of the empirical process.
  • Excess risk of any empirical ε-minimizer is bounded by a multiple of the uniform deviation.
  • Under sharp growth the distance from an empirical minimizer to the true solution set converges at the same rate.
  • Directional differentiability of the infimum yields n^{-1/2} weak convergence of the empirical optimal value.
  • LILs and maximal inequalities upgrade the rates to outer almost-sure and outer-mean convergence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transfer could be used to obtain rates for empirical risk minimization with non-differentiable surrogate losses.
  • Verification of the VC-subgraph and definability conditions on fixed-architecture networks opens the door to rate statements for deep learning without smoothness.
  • The framework suggests that tame geometry rather than smoothness is the essential property needed for rate theory in stochastic optimization.
  • Analogous results may hold for other variational problems whose objective map is directionally differentiable.

Load-bearing premise

The objective function class satisfies definability, admits an integrable envelope, and forms a VC-subgraph class, with an additional sharp-growth condition needed for distance-to-set rates.

What would settle it

An explicit function class where the empirical process converges uniformly at a known rate yet the optimal value or distance to the solution set converges slower than the predicted rate.

read the original abstract

Sample average approximation (SAA) replaces an intractable expected objective by an empirical average and is a basic device of modern stochastic optimization. We develop a rate theory for optimal values and empirical $\varepsilon$-minimizers that does not assume continuity, lower semicontinuity, or smooth perturbation structure of the sample objectives. Working on $\ell^{\infty}(X)$ with the Hoffmann--J{\o}rgensen outer-probability formalism, we show that uniform control of the empirical objective process transfers deterministically to convergence rates for optimal values, excess risks of empirical $\varepsilon$-minimizers, and, under a sharp-growth condition, distances to the expected objective solution set. Combined with the directional differentiability of the infimum functional, this yields weak limits for empirical optimal values at the $n^{-1/2}$ scale. Combined with LILs and maximal inequalities, it yields outer almost-sure and outer-mean rates. The definability, envelope, and VC-subgraph hypotheses are verified for definable discontinuous or non-Lipschitz classes arising in direct $0$--$1$ classification, fixed-architecture neural networks, threshold regression, and non-Lipschitz $\ell_{p}$-type objectives with rational $0<p<1$. Practical sufficient conditions for measurability hypotheses are discussed. Together, the framework extends continuity-based SAA theory to a tame-topological setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript develops a rate theory for sample average approximations (SAA) that dispenses with continuity, lower semicontinuity, and smoothness assumptions on the sample objectives. Working in ℓ^∞(X) under the Hoffmann-Jørgensen outer-probability formalism, it shows that uniform control of the empirical objective process transfers deterministically via elementary inf and ε-minimality inequalities to rates on optimal values, excess risks of empirical ε-minimizers, and (under a sharp-growth condition) distances to the expected solution set. Directional differentiability of the infimum functional then yields n^{-1/2} weak limits for empirical optimal values, while LILs and maximal inequalities give outer a.s. and mean rates. The required definability, envelope, and VC-subgraph conditions are verified explicitly for discontinuous or non-Lipschitz classes arising in 0-1 classification, fixed-architecture neural networks, threshold regression, and rational ℓ_p objectives (0<p<1), with practical measurability conditions discussed.

Significance. If the central claims hold, the work meaningfully extends SAA convergence theory beyond the smooth/continuous regime to settings that arise routinely in modern stochastic optimization and machine learning. The deterministic transfer step is a clean, assumption-light contribution that relies only on the definitions of inf and ε-minimality. Explicit verification of the empirical-process hypotheses for the listed non-Lipschitz applications, together with the treatment of measurability via outer probabilities, makes the results directly applicable rather than purely abstract. The combination with directional differentiability for weak limits and with LILs for a.s. rates provides a unified framework that previous continuity-based SAA analyses do not cover.

minor comments (3)
  1. [§4] The sharp-growth condition (used for distance-to-solution-set rates) is introduced without a concrete numerical example or a simple sufficient condition that practitioners could check; adding one in §4 or §5 would improve usability.
  2. [§6] In the neural-network and threshold-regression applications, the precise envelope and VC-subgraph arguments are stated to hold, but the main text would benefit from a one-paragraph summary of the key definability properties rather than relegating all details to an appendix.
  3. [§3] Notation for the outer-probability statements (e.g., the distinction between P* and P_*) is introduced early but used inconsistently in the weak-limit section; a short notational table or consistent use of a single symbol would reduce reader effort.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading, positive summary, and recommendation of minor revision. We appreciate the recognition that the deterministic transfer step and explicit verifications for non-Lipschitz applications meaningfully extend SAA theory. No specific major comments were raised, so we have no point-by-point rebuttals to provide. We will address any minor editorial or presentational suggestions in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation chain begins with uniform control of the empirical objective process in the Hoffmann-Jørgensen outer-probability sense on ℓ^∞(X), which transfers deterministically to rates on optimal values and excess risks via elementary inequalities that invoke only the definitions of inf and ε-minimality; these hold for arbitrary functions without continuity assumptions. Probabilistic rates then follow from the paper's explicit verification of envelope, definability, and VC-subgraph conditions for the target discontinuous classes, combined with standard external tools (LILs, maximal inequalities). The n^{-1/2} weak-limit result is obtained by composing the uniform convergence with directional differentiability of the infimum map, which is stated as an explicit assumption rather than derived internally. No step reduces by construction to a fitted parameter, self-definition, or self-citation chain; all load-bearing elements rest on independent external results or direct verification.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The framework rests on standard probability and empirical process tools plus domain assumptions about function classes; no free parameters or new entities are introduced.

axioms (3)
  • domain assumption Definability, envelope, and VC-subgraph hypotheses hold for the target discontinuous or non-Lipschitz classes
    Invoked to verify the uniform control and measurability conditions for applications such as 0-1 classification and non-Lipschitz lp objectives.
  • domain assumption Directional differentiability of the infimum functional
    Used to obtain weak limits at the n^{-1/2} scale.
  • domain assumption Sharp-growth condition
    Required for transferring rates to distances to the expected solution set.

pith-pipeline@v0.9.0 · 5542 in / 1440 out tokens · 44025 ms · 2026-05-07T16:40:23.778551+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    Anthony and P

    M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009

  2. [2]

    Banholzer, J

    D. Banholzer, J. Fliege, and R. Werner. On rates of convergence for sample average approximations in the almost sure sense and in mean. Mathematical Programming, 191:307--345, 2022

  3. [3]

    Bolte, A

    J. Bolte, A. Daniilidis, and A. Lewis. Tame functions are semismooth. Mathematical Programming, 117:5--19, 2009

  4. [4]

    Bolte and E

    J. Bolte and E. Pauwels. Majorization--minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Mathematics of Operations Research, 41:442--465, 2016

  5. [5]

    Cui and J.-S

    Y. Cui and J.-S. Pang. Modern Nonconvex Nondifferentiable Optimization. SIAM and MOS, 2022

  6. [6]

    Buccini, O

    A. Buccini, O. De la Cruz Cabrera, M. Donatelli, A. Martinelli, and L. Reichel. Large-scale regression with non-convex loss and penalty. Applied Numerical Mathematics, 157:590--601, 2020

  7. [7]

    C. A. Floudas. Deterministic Global Optimization: Theory, Methods and Applications. Kluwer Academic Publishers, 2000

  8. [8]

    Locatelli and F

    M. Locatelli and F. Schoen. Global Optimization: Theory, Algorithms, and Applications. SIAM and MOS, 2013

  9. [9]

    A. D. Ioffe. An invitation to tame optimization. SIAM Journal on Optimization, 19:1894--1917, 2009

  10. [10]

    S. Kim, R. Pasupathy, and S. G. Henderson. A guide to sample average approximation. In M. C. Fu, editor, Handbook of Simulation Optimization, pages 207--243. Springer, 2015

  11. [11]

    J. Kuelbs. A strong convergence theorem for B anach space valued random variables. Annals of Probability, 4:744--771, 1976

  12. [12]

    J. Kuelbs. Kolmogorov's law of the iterated logarithm for B anach space valued random variables. Illinois Journal of Mathematics, 21:784--800, 1977

  13. [13]

    T. Lê Loi. O-minimal structures. In Proceedings of the Centre for Mathematics and Its Applications, volume 43, pages 19--30, 2010

  14. [14]

    J. L. Montaña and L. M. Pardo. On the V apnik-- C hervonenkis dimension of computer programs which use transcendental elementary operations. Annals of Mathematics and Artificial Intelligence, 56:371--388, 2009

  15. [15]

    J. Pila. Point-Counting and the Zilber--Pink Conjecture. Cambridge University Press, 2022

  16. [16]

    A. Shapiro. Asymptotic analysis of stochastic programs. Annals of Operations Research, 30:169--186, 1991

  17. [17]

    Shapiro, D

    A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory. SIAM, third edition, 2021

  18. [18]

    B. E. Hansen. Sample splitting and threshold estimation. Econometrica, 68:575--603, 2000

  19. [19]

    H. L. Koul, L. Qian, and D. Surgailis. Asymptotics of M -estimators in two-phase linear regression models. Stochastic Processes and their Applications, 103:123--154, 2003

  20. [20]

    Li and H.-T

    L. Li and H.-T. Lin. Optimizing 0/1 loss for perceptrons by random coordinate descent. In Proceedings of the International Joint Conference on Neural Networks, pages 749--754. IEEE, 2007

  21. [21]

    T. T. Nguyen and S. Sanner. Algorithms for direct 0 -- 1 loss optimization in binary classification. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1085--1093, 2013

  22. [22]

    van den Dries

    L. van den Dries. Tame Topology and O-minimal Structures. Cambridge University Press, 1998

  23. [23]

    A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, second edition, 2023

  24. [24]

    Westerhout, T

    J. Westerhout, T. T. Nguyen, X. Guo, and H. D. Nguyen. On the asymptotic distribution of the minimum empirical risk. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 52869--52902, 2024

  25. [25]

    Y.-F. Ye, C. Ying, Y.-H. Shao, C.-N. Li, and Y.-J. Chen. Robust and sparse LP -norm support vector regression. Journal of Advanced Computational Intelligence and Intelligent Informatics, 21:989--997, 2017

  26. [26]

    J. E. Yukich. The law of the iterated logarithm for empirical processes. In U. Haagerup, J. Hoffmann-Jørgensen, and M. B. Marcus, editors, Probability in Banach Spaces 6, pages 265--282. Birkhäuser, 1990