Recognition: 3 theorem links
· Lean TheoremConvergence of difference inclusions via a diameter criterion
Pith reviewed 2026-05-15 02:20 UTC · model grok-4.3
The pith
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Combining the diameter criterion with a diameter estimate obtained from this framework yields convergence of common first-order optimization methods under step sizes of order 1/k. The guarantees cover inexact and stochastic subgradient methods, as well as the momentum method, for locally Lipschitz objectives definable in polynomially bounded o-minimal structures.
Load-bearing premise
The objectives must be locally Lipschitz and definable in polynomially bounded o-minimal structures so that the stratified descent framework produces a summable error term in the potential decrease.
read the original abstract
We study discrete dynamics governed by a difference inclusion whose increment is the sum of a selection from a set-valued map and a noise term. For any bounded realization, convergence follows once the inter-iterate diameter is controlled by the variation of a continuous potential. The limit point is then critical for a scaled outer limit of the update map. To certify this diameter criterion, we develop a stratified descent framework: we project iterates onto a suitable stratification and track a potential that decreases up to a summable error. Combining the diameter criterion with a diameter estimate obtained from this framework yields convergence of common first-order optimization methods under step sizes of order $1/k$. The guarantees cover inexact and stochastic subgradient methods, as well as the momentum method, for locally Lipschitz objectives definable in polynomially bounded o-minimal structures. Our arguments are entirely discrete, with no appeal to continuous-time approximations.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Objectives are locally Lipschitz and definable in polynomially bounded o-minimal structures
Reference graph
Works this paper leans on
-
[1]
Combettes, Patrick L , booktitle=. Fej. 2008 , publisher=
work page 2008
-
[2]
A weak-to-strong convergence principle for Fej
Bauschke, Heinz H and Combettes, Patrick L , journal=. A weak-to-strong convergence principle for Fej. 2001 , publisher=
work page 2001
-
[3]
Festschrift David Hilbert zu Seinem Sechzigsten Geburtstag am 23
Fej. Festschrift David Hilbert zu Seinem Sechzigsten Geburtstag am 23. Januar 1922 , pages=. 1922 , publisher=
work page 1922
- [5]
- [6]
-
[7]
SIAM Journal on Optimization , volume=
Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization , author=. SIAM Journal on Optimization , volume=. 2007 , publisher=
work page 2007
-
[8]
Seminaire de probabilites XXXIII , pages=
Dynamics of stochastic approximation algorithms , author=. Seminaire de probabilites XXXIII , pages=. 2006 , publisher=
work page 2006
-
[9]
Splitting methods with variable metric for Kurdyka--
Frankel, Pierre and Garrigos, Guillaume and Peypouquet, Juan , journal=. Splitting methods with variable metric for Kurdyka--. 2015 , publisher=
work page 2015
-
[10]
Exploring artificial intelligence in the new millennium , volume=
Understanding belief propagation and its generalizations , author=. Exploring artificial intelligence in the new millennium , volume=
-
[11]
Correctness of local probability propagation in graphical models with loops , author=. Neural computation , volume=. 2000 , publisher=
work page 2000
-
[12]
Journal of Fourier Analysis and Applications , volume=
A randomized Kaczmarz algorithm with exponential convergence , author=. Journal of Fourier Analysis and Applications , volume=. 2009 , publisher=
work page 2009
-
[13]
Applied and computational harmonic analysis , volume=
Iterative hard thresholding for compressed sensing , author=. Applied and computational harmonic analysis , volume=. 2009 , publisher=
work page 2009
- [14]
-
[15]
Computer networks and ISDN systems , volume=
The anatomy of a large-scale hypertextual web search engine , author=. Computer networks and ISDN systems , volume=. 1998 , publisher=
work page 1998
-
[16]
Q-learning , author=. Machine learning , volume=. 1992 , publisher=
work page 1992
-
[17]
Learning to predict by the methods of temporal differences , author=. Machine learning , volume=. 1988 , publisher=
work page 1988
-
[18]
IEEE Transactions on automatic control , volume=
Consensus problems in networks of agents with switching topology and time-delays , author=. IEEE Transactions on automatic control , volume=. 2004 , publisher=
work page 2004
-
[19]
IEEE Transactions on automatic control , volume=
Coordination of groups of mobile autonomous agents using nearest neighbor rules , author=. IEEE Transactions on automatic control , volume=. 2003 , publisher=
work page 2003
-
[20]
Journal of the royal statistical society: series B (methodological) , volume=
Maximum likelihood from incomplete data via the EM algorithm , author=. Journal of the royal statistical society: series B (methodological) , volume=. 1977 , publisher=
work page 1977
-
[21]
SIAM Journal on Optimization , volume=
Efficiency of coordinate descent methods on huge-scale optimization problems , author=. SIAM Journal on Optimization , volume=. 2012 , publisher=
work page 2012
-
[22]
Mathematical Programming , volume=
Proximal alternating linearized minimization for nonconvex and nonsmooth problems , author=. Mathematical Programming , volume=. 2014 , publisher=
work page 2014
-
[23]
SIAM journal on imaging sciences , volume=
A fast iterative shrinkage-thresholding algorithm for linear inverse problems , author=. SIAM journal on imaging sciences , volume=. 2009 , publisher=
work page 2009
-
[24]
SIAM Journal on Optimization , volume=
Incremental subgradient methods for nondifferentiable optimization , author=. SIAM Journal on Optimization , volume=. 2001 , publisher=
work page 2001
-
[25]
Stochastic Processes and their Applications , volume=
Convergence and convergence rate of stochastic gradient search in the case of multiple and non-isolated extrema , author=. Stochastic Processes and their Applications , volume=. 2015 , publisher=
work page 2015
-
[26]
Mathematical Programming , volume=
Optimization of Lipschitz continuous functions , author=. Mathematical Programming , volume=. 1977 , publisher=
work page 1977
-
[27]
arXiv preprint arXiv:2305.05828 , year=
Convergence of a normal map-based prox-sgd method under the kl inequality , author=. arXiv preprint arXiv:2305.05828 , year=
-
[28]
International conference on machine learning , pages=
Momentum improves normalized sgd , author=. International conference on machine learning , pages=. 2020 , organization=
work page 2020
-
[29]
Stochastic approximation: a dynamical systems viewpoint , author=. 2008 , publisher=
work page 2008
-
[30]
arXiv preprint arXiv:2405.16954 , year=
Convergence of SGD with momentum in the nonconvex case: A time window-based analysis , author=. arXiv preprint arXiv:2405.16954 , year=
-
[31]
Bulletin of Mathematical Sciences , volume=
\ Euclidean, metric, and Wasserstein \ gradient flows: an overview , author=. Bulletin of Mathematical Sciences , volume=. 2017 , publisher=
work page 2017
-
[32]
Cauchy, Augustin , journal=. M
-
[33]
Nonsmooth optimization , pages=
Subgradient methods: a survey of Soviet research , author=. Nonsmooth optimization , pages=
-
[34]
Geometric theory of dynamical systems: an introduction , author=. 2012 , publisher=
work page 2012
-
[35]
Metric structures for Riemannian and non-Riemannian spaces , author=. 1999 , publisher=
work page 1999
-
[36]
Quasi-convex decomposition in o-minimal structures: application to the gradient conjecture , author=. 2001 , publisher=
work page 2001
-
[37]
Russian Mathematical Surveys , volume=
Functions whose gradient is bounded by the reciprocal distance from the boundary of their domain , author=. Russian Mathematical Surveys , volume=. 1974 , publisher=
work page 1974
-
[38]
Journal of Mathematical Analysis and Applications , volume=
Extension of Lipschitz functions , author=. Journal of Mathematical Analysis and Applications , volume=. 1980 , publisher=
work page 1980
-
[39]
Hassler Whitney Collected Papers , pages=
Tangents to an analytic variety , author=. Hassler Whitney Collected Papers , pages=. 1992 , publisher=
work page 1992
-
[40]
Handbook of geometry and topology of singularities I , pages=
Stratification theory , author=. Handbook of geometry and topology of singularities I , pages=. 2020 , publisher=
work page 2020
-
[41]
Illinois Journal of Mathematics , volume=
John functions, quadratic integral forms and o-minimal structures , author=. Illinois Journal of Mathematics , volume=. 2002 , publisher=
work page 2002
-
[42]
Mathematica Scandinavica , pages=
Quasihyperbolic geodesics in John domains , author=. Mathematica Scandinavica , pages=. 1989 , publisher=
work page 1989
-
[43]
Annales de l'institut Fourier , volume=
Lipschitz properties of semi-analytic sets , author=. Annales de l'institut Fourier , volume=
-
[44]
Annales scientifiques de l'Ecole normale sup
Lipschitz stratification of subanalytic sets , author=. Annales scientifiques de l'Ecole normale sup
- [45]
-
[46]
Illinois Journal of Mathematics , volume=
Verdier and strict Thom stratifications in o-minimal structures , author=. Illinois Journal of Mathematics , volume=. 1998 , publisher=
work page 1998
-
[47]
Foundations of Computational Mathematics , pages=
Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization , author=. Foundations of Computational Mathematics , pages=. 2025 , publisher=
work page 2025
-
[48]
Stratifications de Whitney et th
Verdier, Jean-Louis , journal=. Stratifications de Whitney et th. 1976 , publisher=
work page 1976
- [49]
-
[50]
Proximal smoothness and the lower-C2 property , author=. J. Convex Anal , volume=
-
[51]
On a subanalytic stratification satisfying a Whitney property with exponent 1 , author=. Real Algebraic Geometry: Proceedings of the Conference held in Rennes, France, June 24--28, 1991 , pages=. 2006 , organization=
work page 1991
- [52]
- [53]
-
[54]
Annales Polonici Mathematici , volume=
A decomposition of a set definable in an o-minimal structure into perfectly situated sets , author=. Annales Polonici Mathematici , volume=
-
[55]
Lipschitz stratifications in o-minimal structures , author=. Annales Scientifiques de l'
-
[56]
Annales de l'institut Fourier , volume=
A linear extension operator for Whitney fields on closed o-minimal sets , author=. Annales de l'institut Fourier , volume=
-
[57]
Vietnam Journal of Mathematics , pages=
Revisiting subgradient method: Complexity and convergence beyond Lipschitz continuity , author=. Vietnam Journal of Mathematics , pages=. 2024 , publisher=
work page 2024
-
[58]
Mathematical Programming , volume=
On the projected subgradient method for nonsmooth convex optimization in a Hilbert space , author=. Mathematical Programming , volume=. 1998 , publisher=
work page 1998
-
[59]
Journal of Dynamical and Control Systems , volume=
On Morse theory for piecewise smooth functions , author=. Journal of Dynamical and Control Systems , volume=. 1997 , publisher=
work page 1997
-
[60]
Computational Optimization and Applications , volume=
A property of piecewise smooth functions , author=. Computational Optimization and Applications , volume=. 2003 , publisher=
work page 2003
-
[61]
Nonlinear Analysis: Theory, Methods & Applications , volume=
On almost smooth functions and piecewise smooth functions , author=. Nonlinear Analysis: Theory, Methods & Applications , volume=. 2007 , publisher=
work page 2007
-
[62]
An introduction to optimization on smooth manifolds , author=. 2023 , publisher=
work page 2023
-
[63]
Mathematical Programming , volume=
Stochastic algorithms with geometric step decay converge linearly on sharp functions , author=. Mathematical Programming , volume=. 2024 , publisher=
work page 2024
-
[64]
Journal of computational and Applied Mathematics , volume=
Genetic algorithms for modelling and optimisation , author=. Journal of computational and Applied Mathematics , volume=. 2005 , publisher=
work page 2005
-
[65]
Linear Algebra and its Applications , volume=
On the least squares distance between affine subspaces , author=. Linear Algebra and its Applications , volume=. 1996 , publisher=
work page 1996
-
[66]
Mathematical Programming , pages=
No dimension-free deterministic algorithm computes approximate stationarities of lipschitzians , author=. Mathematical Programming , pages=. 2024 , publisher=
work page 2024
-
[67]
Advances in Neural Information Processing Systems , volume=
Subquadratic overparameterization for shallow neural networks , author=. Advances in Neural Information Processing Systems , volume=
-
[68]
Applied and Computational Harmonic Analysis , volume=
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks , author=. Applied and Computational Harmonic Analysis , volume=. 2022 , publisher=
work page 2022
-
[69]
International Conference on Machine Learning , pages=
On the proof of global convergence of gradient descent for deep relu networks with linear widths , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[70]
Some NP-complete problems in quadratic and nonlinear programming , author=
-
[71]
Proceedings of the IEEE , volume=
Gradient-based learning applied to document recognition , author=. Proceedings of the IEEE , volume=. 1998 , publisher=
work page 1998
-
[72]
arXiv preprint arXiv:1909.12292 , year=
Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow relu networks , author=. arXiv preprint arXiv:1909.12292 , year=
-
[73]
Advances in neural information processing systems , volume=
On the convergence rate of training recurrent neural networks , author=. Advances in neural information processing systems , volume=
-
[74]
IEEE Journal on Selected Areas in Information Theory , volume=
Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks , author=. IEEE Journal on Selected Areas in Information Theory , volume=. 2020 , publisher=
work page 2020
-
[75]
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient descent provably optimizes over-parameterized neural networks , author=. arXiv preprint arXiv:1810.02054 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[76]
International conference on machine learning , pages=
Gradient descent finds global minima of deep neural networks , author=. International conference on machine learning , pages=. 2019 , organization=
work page 2019
-
[77]
Advances in neural information processing systems , volume=
Neural tangent kernel: Convergence and generalization in neural networks , author=. Advances in neural information processing systems , volume=
-
[78]
Journal of Machine Learning Research , volume=
Adam-family methods for nonsmooth optimization with convergence guarantees , author=. Journal of Machine Learning Research , volume=
-
[79]
Journal of mathematical analysis and applications , volume=
General convergence results for stochastic approximations via weak convergence theory , author=. Journal of mathematical analysis and applications , volume=. 1977 , publisher=
work page 1977
-
[80]
International Conference on Algorithmic Learning Theory , pages=
Provable Accelerated Convergence of Nesterov’s Momentum for Deep ReLU Neural Networks , author=. International Conference on Algorithmic Learning Theory , pages=. 2024 , organization=
work page 2024
-
[81]
SIAM Journal on Optimization , volume=
Nonconvex robust low-rank matrix recovery , author=. SIAM Journal on Optimization , volume=. 2020 , publisher=
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.