On rates of convergence for sample average approximations without smoothness

Hien Duy Nguyen; Jacob Westerhout; Xin Guo

arxiv: 2604.25153 · v1 · submitted 2026-04-28 · 🧮 math.OC

On rates of convergence for sample average approximations without smoothness

Hien Duy Nguyen , Jacob Westerhout , Xin Guo This is my paper

Pith reviewed 2026-05-07 16:40 UTC · model grok-4.3

classification 🧮 math.OC

keywords sample average approximationstochastic optimizationrates of convergenceempirical processesnon-Lipschitz functionsVC-subgraph classesneural networks0-1 classification

0 comments

The pith

Uniform control of the empirical objective process transfers directly to convergence rates for optimal values and empirical minimizers without continuity or smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops rates of convergence for optimal values, excess risks of ε-minimizers, and distances to solution sets in sample average approximation. It shows that any uniform bound on how empirical objectives approximate the expected objective immediately yields explicit rates for these quantities, even when the objectives are discontinuous or fail to be Lipschitz. The transfer holds in a general setting using outer probability and applies after verifying definability, envelope, and VC-subgraph conditions on the function class. When the infimum functional is directionally differentiable, the same control produces weak convergence of the optimal value at the sqrt(n) scale; laws of the iterated logarithm then give almost-sure rates.

Core claim

Uniform control of the empirical objective process on ℓ^∞(X) transfers deterministically to convergence rates for optimal values, excess risks of empirical ε-minimizers, and, under a sharp-growth condition, distances to the expected objective solution set; combined with directional differentiability of the infimum functional this yields weak limits at the n^{-1/2} scale.

What carries the argument

The deterministic transfer from uniform deviation bounds on the centered empirical process to the infimum and argmin functionals, formalized via the Hoffmann-Jørgensen outer-probability calculus.

If this is right

Optimal values inherit the same convergence rate as the uniform deviation of the empirical process.
Excess risk of any empirical ε-minimizer is bounded by a multiple of the uniform deviation.
Under sharp growth the distance from an empirical minimizer to the true solution set converges at the same rate.
Directional differentiability of the infimum yields n^{-1/2} weak convergence of the empirical optimal value.
LILs and maximal inequalities upgrade the rates to outer almost-sure and outer-mean convergence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transfer could be used to obtain rates for empirical risk minimization with non-differentiable surrogate losses.
Verification of the VC-subgraph and definability conditions on fixed-architecture networks opens the door to rate statements for deep learning without smoothness.
The framework suggests that tame geometry rather than smoothness is the essential property needed for rate theory in stochastic optimization.
Analogous results may hold for other variational problems whose objective map is directionally differentiable.

Load-bearing premise

The objective function class satisfies definability, admits an integrable envelope, and forms a VC-subgraph class, with an additional sharp-growth condition needed for distance-to-set rates.

What would settle it

An explicit function class where the empirical process converges uniformly at a known rate yet the optimal value or distance to the solution set converges slower than the predicted rate.

read the original abstract

Sample average approximation (SAA) replaces an intractable expected objective by an empirical average and is a basic device of modern stochastic optimization. We develop a rate theory for optimal values and empirical $\varepsilon$-minimizers that does not assume continuity, lower semicontinuity, or smooth perturbation structure of the sample objectives. Working on $\ell^{\infty}(X)$ with the Hoffmann--J{\o}rgensen outer-probability formalism, we show that uniform control of the empirical objective process transfers deterministically to convergence rates for optimal values, excess risks of empirical $\varepsilon$-minimizers, and, under a sharp-growth condition, distances to the expected objective solution set. Combined with the directional differentiability of the infimum functional, this yields weak limits for empirical optimal values at the $n^{-1/2}$ scale. Combined with LILs and maximal inequalities, it yields outer almost-sure and outer-mean rates. The definability, envelope, and VC-subgraph hypotheses are verified for definable discontinuous or non-Lipschitz classes arising in direct $0$--$1$ classification, fixed-architecture neural networks, threshold regression, and non-Lipschitz $\ell_{p}$-type objectives with rational $0<p<1$. Practical sufficient conditions for measurability hypotheses are discussed. Together, the framework extends continuity-based SAA theory to a tame-topological setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends SAA rate theory to discontinuous and non-smooth objectives using outer probabilities and tame topology, with clean deterministic transfers and checks for classification and neural net cases.

read the letter

The main thing to know is that this paper drops the continuity and smoothness assumptions that have limited SAA convergence theory, and still obtains rates on optimal values, excess risks, and distances to solution sets under a growth condition. It works by showing that uniform control of the empirical objective process in the Hoffmann-Jørgensen outer-probability sense transfers directly to those optimization quantities through elementary inequalities that use only the definitions of inf and ε-minimality. No lower semicontinuity is required for that step. They then add LILs and maximal inequalities for almost-sure and mean rates, and compose with directional differentiability of the infimum map to get n^{-1/2} weak limits. The definability, envelope, and VC-subgraph conditions are verified for the discontinuous classes in 0-1 classification, fixed neural networks, threshold regression, and non-Lipschitz ℓ_p objectives with rational p less than 1, plus practical notes on measurability. This is the substantive advance over prior continuity-based work. The deterministic transfer is general and avoids hidden regularity, which is a strength. The verifications for the listed applications are the part that makes the claims usable rather than abstract. A soft spot is that the sharp-growth condition needed for distance-to-set rates will need case-by-case checking in applications and may not hold broadly. Directional differentiability is also stated as an assumption rather than derived, so the weak-limit result is conditional on it. Neither is a flaw in the logic, just a limit on scope. This is for researchers in stochastic optimization who work with models that have jumps or non-Lipschitz behavior. Anyone extending empirical process tools to optimization will find the framework and examples worth reading. It deserves a serious referee because the core argument is grounded in standard tools applied without circularity and the applications are concrete.

Referee Report

0 major / 3 minor

Summary. The manuscript develops a rate theory for sample average approximations (SAA) that dispenses with continuity, lower semicontinuity, and smoothness assumptions on the sample objectives. Working in ℓ^∞(X) under the Hoffmann-Jørgensen outer-probability formalism, it shows that uniform control of the empirical objective process transfers deterministically via elementary inf and ε-minimality inequalities to rates on optimal values, excess risks of empirical ε-minimizers, and (under a sharp-growth condition) distances to the expected solution set. Directional differentiability of the infimum functional then yields n^{-1/2} weak limits for empirical optimal values, while LILs and maximal inequalities give outer a.s. and mean rates. The required definability, envelope, and VC-subgraph conditions are verified explicitly for discontinuous or non-Lipschitz classes arising in 0-1 classification, fixed-architecture neural networks, threshold regression, and rational ℓ_p objectives (0<p<1), with practical measurability conditions discussed.

Significance. If the central claims hold, the work meaningfully extends SAA convergence theory beyond the smooth/continuous regime to settings that arise routinely in modern stochastic optimization and machine learning. The deterministic transfer step is a clean, assumption-light contribution that relies only on the definitions of inf and ε-minimality. Explicit verification of the empirical-process hypotheses for the listed non-Lipschitz applications, together with the treatment of measurability via outer probabilities, makes the results directly applicable rather than purely abstract. The combination with directional differentiability for weak limits and with LILs for a.s. rates provides a unified framework that previous continuity-based SAA analyses do not cover.

minor comments (3)

[§4] The sharp-growth condition (used for distance-to-solution-set rates) is introduced without a concrete numerical example or a simple sufficient condition that practitioners could check; adding one in §4 or §5 would improve usability.
[§6] In the neural-network and threshold-regression applications, the precise envelope and VC-subgraph arguments are stated to hold, but the main text would benefit from a one-paragraph summary of the key definability properties rather than relegating all details to an appendix.
[§3] Notation for the outer-probability statements (e.g., the distinction between P* and P_*) is introduced early but used inconsistently in the weak-limit section; a short notational table or consistent use of a single symbol would reduce reader effort.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading, positive summary, and recommendation of minor revision. We appreciate the recognition that the deterministic transfer step and explicit verifications for non-Lipschitz applications meaningfully extend SAA theory. No specific major comments were raised, so we have no point-by-point rebuttals to provide. We will address any minor editorial or presentational suggestions in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation chain begins with uniform control of the empirical objective process in the Hoffmann-Jørgensen outer-probability sense on ℓ^∞(X), which transfers deterministically to rates on optimal values and excess risks via elementary inequalities that invoke only the definitions of inf and ε-minimality; these hold for arbitrary functions without continuity assumptions. Probabilistic rates then follow from the paper's explicit verification of envelope, definability, and VC-subgraph conditions for the target discontinuous classes, combined with standard external tools (LILs, maximal inequalities). The n^{-1/2} weak-limit result is obtained by composing the uniform convergence with directional differentiability of the infimum map, which is stated as an explicit assumption rather than derived internally. No step reduces by construction to a fitted parameter, self-definition, or self-citation chain; all load-bearing elements rest on independent external results or direct verification.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The framework rests on standard probability and empirical process tools plus domain assumptions about function classes; no free parameters or new entities are introduced.

axioms (3)

domain assumption Definability, envelope, and VC-subgraph hypotheses hold for the target discontinuous or non-Lipschitz classes
Invoked to verify the uniform control and measurability conditions for applications such as 0-1 classification and non-Lipschitz lp objectives.
domain assumption Directional differentiability of the infimum functional
Used to obtain weak limits at the n^{-1/2} scale.
domain assumption Sharp-growth condition
Required for transferring rates to distances to the expected solution set.

pith-pipeline@v0.9.0 · 5542 in / 1440 out tokens · 44025 ms · 2026-05-07T16:40:23.778551+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Anthony and P

M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009

work page 2009
[2]

Banholzer, J

D. Banholzer, J. Fliege, and R. Werner. On rates of convergence for sample average approximations in the almost sure sense and in mean. Mathematical Programming, 191:307--345, 2022

work page 2022
[3]

Bolte, A

J. Bolte, A. Daniilidis, and A. Lewis. Tame functions are semismooth. Mathematical Programming, 117:5--19, 2009

work page 2009
[4]

Bolte and E

J. Bolte and E. Pauwels. Majorization--minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Mathematics of Operations Research, 41:442--465, 2016

work page 2016
[5]

Cui and J.-S

Y. Cui and J.-S. Pang. Modern Nonconvex Nondifferentiable Optimization. SIAM and MOS, 2022

work page 2022
[6]

Buccini, O

A. Buccini, O. De la Cruz Cabrera, M. Donatelli, A. Martinelli, and L. Reichel. Large-scale regression with non-convex loss and penalty. Applied Numerical Mathematics, 157:590--601, 2020

work page 2020
[7]

C. A. Floudas. Deterministic Global Optimization: Theory, Methods and Applications. Kluwer Academic Publishers, 2000

work page 2000
[8]

Locatelli and F

M. Locatelli and F. Schoen. Global Optimization: Theory, Algorithms, and Applications. SIAM and MOS, 2013

work page 2013
[9]

A. D. Ioffe. An invitation to tame optimization. SIAM Journal on Optimization, 19:1894--1917, 2009

work page 1917
[10]

S. Kim, R. Pasupathy, and S. G. Henderson. A guide to sample average approximation. In M. C. Fu, editor, Handbook of Simulation Optimization, pages 207--243. Springer, 2015

work page 2015
[11]

J. Kuelbs. A strong convergence theorem for B anach space valued random variables. Annals of Probability, 4:744--771, 1976

work page 1976
[12]

J. Kuelbs. Kolmogorov's law of the iterated logarithm for B anach space valued random variables. Illinois Journal of Mathematics, 21:784--800, 1977

work page 1977
[13]

T. Lê Loi. O-minimal structures. In Proceedings of the Centre for Mathematics and Its Applications, volume 43, pages 19--30, 2010

work page 2010
[14]

J. L. Montaña and L. M. Pardo. On the V apnik-- C hervonenkis dimension of computer programs which use transcendental elementary operations. Annals of Mathematics and Artificial Intelligence, 56:371--388, 2009

work page 2009
[15]

J. Pila. Point-Counting and the Zilber--Pink Conjecture. Cambridge University Press, 2022

work page 2022
[16]

A. Shapiro. Asymptotic analysis of stochastic programs. Annals of Operations Research, 30:169--186, 1991

work page 1991
[17]

Shapiro, D

A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory. SIAM, third edition, 2021

work page 2021
[18]

B. E. Hansen. Sample splitting and threshold estimation. Econometrica, 68:575--603, 2000

work page 2000
[19]

H. L. Koul, L. Qian, and D. Surgailis. Asymptotics of M -estimators in two-phase linear regression models. Stochastic Processes and their Applications, 103:123--154, 2003

work page 2003
[20]

Li and H.-T

L. Li and H.-T. Lin. Optimizing 0/1 loss for perceptrons by random coordinate descent. In Proceedings of the International Joint Conference on Neural Networks, pages 749--754. IEEE, 2007

work page 2007
[21]

T. T. Nguyen and S. Sanner. Algorithms for direct 0 -- 1 loss optimization in binary classification. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1085--1093, 2013

work page 2013
[22]

van den Dries

L. van den Dries. Tame Topology and O-minimal Structures. Cambridge University Press, 1998

work page 1998
[23]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, second edition, 2023

work page 2023
[24]

Westerhout, T

J. Westerhout, T. T. Nguyen, X. Guo, and H. D. Nguyen. On the asymptotic distribution of the minimum empirical risk. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 52869--52902, 2024

work page 2024
[25]

Y.-F. Ye, C. Ying, Y.-H. Shao, C.-N. Li, and Y.-J. Chen. Robust and sparse LP -norm support vector regression. Journal of Advanced Computational Intelligence and Intelligent Informatics, 21:989--997, 2017

work page 2017
[26]

J. E. Yukich. The law of the iterated logarithm for empirical processes. In U. Haagerup, J. Hoffmann-Jørgensen, and M. B. Marcus, editors, Probability in Banach Spaces 6, pages 265--282. Birkhäuser, 1990

work page 1990

[1] [1]

Anthony and P

M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009

work page 2009

[2] [2]

Banholzer, J

D. Banholzer, J. Fliege, and R. Werner. On rates of convergence for sample average approximations in the almost sure sense and in mean. Mathematical Programming, 191:307--345, 2022

work page 2022

[3] [3]

Bolte, A

J. Bolte, A. Daniilidis, and A. Lewis. Tame functions are semismooth. Mathematical Programming, 117:5--19, 2009

work page 2009

[4] [4]

Bolte and E

J. Bolte and E. Pauwels. Majorization--minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Mathematics of Operations Research, 41:442--465, 2016

work page 2016

[5] [5]

Cui and J.-S

Y. Cui and J.-S. Pang. Modern Nonconvex Nondifferentiable Optimization. SIAM and MOS, 2022

work page 2022

[6] [6]

Buccini, O

A. Buccini, O. De la Cruz Cabrera, M. Donatelli, A. Martinelli, and L. Reichel. Large-scale regression with non-convex loss and penalty. Applied Numerical Mathematics, 157:590--601, 2020

work page 2020

[7] [7]

C. A. Floudas. Deterministic Global Optimization: Theory, Methods and Applications. Kluwer Academic Publishers, 2000

work page 2000

[8] [8]

Locatelli and F

M. Locatelli and F. Schoen. Global Optimization: Theory, Algorithms, and Applications. SIAM and MOS, 2013

work page 2013

[9] [9]

A. D. Ioffe. An invitation to tame optimization. SIAM Journal on Optimization, 19:1894--1917, 2009

work page 1917

[10] [10]

S. Kim, R. Pasupathy, and S. G. Henderson. A guide to sample average approximation. In M. C. Fu, editor, Handbook of Simulation Optimization, pages 207--243. Springer, 2015

work page 2015

[11] [11]

J. Kuelbs. A strong convergence theorem for B anach space valued random variables. Annals of Probability, 4:744--771, 1976

work page 1976

[12] [12]

J. Kuelbs. Kolmogorov's law of the iterated logarithm for B anach space valued random variables. Illinois Journal of Mathematics, 21:784--800, 1977

work page 1977

[13] [13]

T. Lê Loi. O-minimal structures. In Proceedings of the Centre for Mathematics and Its Applications, volume 43, pages 19--30, 2010

work page 2010

[14] [14]

J. L. Montaña and L. M. Pardo. On the V apnik-- C hervonenkis dimension of computer programs which use transcendental elementary operations. Annals of Mathematics and Artificial Intelligence, 56:371--388, 2009

work page 2009

[15] [15]

J. Pila. Point-Counting and the Zilber--Pink Conjecture. Cambridge University Press, 2022

work page 2022

[16] [16]

A. Shapiro. Asymptotic analysis of stochastic programs. Annals of Operations Research, 30:169--186, 1991

work page 1991

[17] [17]

Shapiro, D

A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory. SIAM, third edition, 2021

work page 2021

[18] [18]

B. E. Hansen. Sample splitting and threshold estimation. Econometrica, 68:575--603, 2000

work page 2000

[19] [19]

H. L. Koul, L. Qian, and D. Surgailis. Asymptotics of M -estimators in two-phase linear regression models. Stochastic Processes and their Applications, 103:123--154, 2003

work page 2003

[20] [20]

Li and H.-T

L. Li and H.-T. Lin. Optimizing 0/1 loss for perceptrons by random coordinate descent. In Proceedings of the International Joint Conference on Neural Networks, pages 749--754. IEEE, 2007

work page 2007

[21] [21]

T. T. Nguyen and S. Sanner. Algorithms for direct 0 -- 1 loss optimization in binary classification. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1085--1093, 2013

work page 2013

[22] [22]

van den Dries

L. van den Dries. Tame Topology and O-minimal Structures. Cambridge University Press, 1998

work page 1998

[23] [23]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, second edition, 2023

work page 2023

[24] [24]

Westerhout, T

J. Westerhout, T. T. Nguyen, X. Guo, and H. D. Nguyen. On the asymptotic distribution of the minimum empirical risk. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 52869--52902, 2024

work page 2024

[25] [25]

Y.-F. Ye, C. Ying, Y.-H. Shao, C.-N. Li, and Y.-J. Chen. Robust and sparse LP -norm support vector regression. Journal of Advanced Computational Intelligence and Intelligent Informatics, 21:989--997, 2017

work page 2017

[26] [26]

J. E. Yukich. The law of the iterated logarithm for empirical processes. In U. Haagerup, J. Hoffmann-Jørgensen, and M. B. Marcus, editors, Probability in Banach Spaces 6, pages 265--282. Birkhäuser, 1990

work page 1990