pith. machine review for the scientific record. sign in

arxiv: 2605.14345 · v1 · submitted 2026-05-14 · 🧮 math.OC

Recognition: 3 theorem links

· Lean Theorem

Convergence of difference inclusions via a diameter criterion

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:20 UTC · model grok-4.3

classification 🧮 math.OC
keywords diameterconvergencecriterionboundeddifferencediscreteframeworklimit
0
0 comments X

The pith

A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Consider a sequence of points updated by picking a direction from a set-valued map and adding noise. The work proves that if the gap between one point and the next stays small enough relative to how much a potential function varies, the sequence must approach a point where no improving direction exists in the limit map. To establish the needed diameter bound, the space is divided into layers called a stratification, and the potential is shown to decrease except for errors that sum to a finite total. This discrete argument covers inexact and noisy subgradient steps as well as momentum, provided the objective is locally Lipschitz and belongs to a broad class of functions definable in o-minimal structures. No continuous-time limit is used.

Core claim

Combining the diameter criterion with a diameter estimate obtained from this framework yields convergence of common first-order optimization methods under step sizes of order 1/k. The guarantees cover inexact and stochastic subgradient methods, as well as the momentum method, for locally Lipschitz objectives definable in polynomially bounded o-minimal structures.

Load-bearing premise

The objectives must be locally Lipschitz and definable in polynomially bounded o-minimal structures so that the stratified descent framework produces a summable error term in the potential decrease.

read the original abstract

We study discrete dynamics governed by a difference inclusion whose increment is the sum of a selection from a set-valued map and a noise term. For any bounded realization, convergence follows once the inter-iterate diameter is controlled by the variation of a continuous potential. The limit point is then critical for a scaled outer limit of the update map. To certify this diameter criterion, we develop a stratified descent framework: we project iterates onto a suitable stratification and track a potential that decreases up to a summable error. Combining the diameter criterion with a diameter estimate obtained from this framework yields convergence of common first-order optimization methods under step sizes of order $1/k$. The guarantees cover inexact and stochastic subgradient methods, as well as the momentum method, for locally Lipschitz objectives definable in polynomially bounded o-minimal structures. Our arguments are entirely discrete, with no appeal to continuous-time approximations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about the function class and the existence of a suitable stratification; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption Objectives are locally Lipschitz and definable in polynomially bounded o-minimal structures
    Invoked to guarantee that the stratified descent produces a summable error in potential decrease.

pith-pipeline@v0.9.0 · 5440 in / 1170 out tokens · 109455 ms · 2026-05-15T02:20:51.712683+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages · 11 internal anchors

  1. [1]

    Combettes, Patrick L , booktitle=. Fej. 2008 , publisher=

  2. [2]

    A weak-to-strong convergence principle for Fej

    Bauschke, Heinz H and Combettes, Patrick L , journal=. A weak-to-strong convergence principle for Fej. 2001 , publisher=

  3. [3]

    Festschrift David Hilbert zu Seinem Sechzigsten Geburtstag am 23

    Fej. Festschrift David Hilbert zu Seinem Sechzigsten Geburtstag am 23. Januar 1922 , pages=. 1922 , publisher=

  4. [5]

    1976 , publisher=

    The stability of dynamical systems , author=. 1976 , publisher=

  5. [6]

    2002 , publisher=

    Nonlinear systems , author=. 2002 , publisher=

  6. [7]

    SIAM Journal on Optimization , volume=

    Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization , author=. SIAM Journal on Optimization , volume=. 2007 , publisher=

  7. [8]

    Seminaire de probabilites XXXIII , pages=

    Dynamics of stochastic approximation algorithms , author=. Seminaire de probabilites XXXIII , pages=. 2006 , publisher=

  8. [9]

    Splitting methods with variable metric for Kurdyka--

    Frankel, Pierre and Garrigos, Guillaume and Peypouquet, Juan , journal=. Splitting methods with variable metric for Kurdyka--. 2015 , publisher=

  9. [10]

    Exploring artificial intelligence in the new millennium , volume=

    Understanding belief propagation and its generalizations , author=. Exploring artificial intelligence in the new millennium , volume=

  10. [11]

    Neural computation , volume=

    Correctness of local probability propagation in graphical models with loops , author=. Neural computation , volume=. 2000 , publisher=

  11. [12]

    Journal of Fourier Analysis and Applications , volume=

    A randomized Kaczmarz algorithm with exponential convergence , author=. Journal of Fourier Analysis and Applications , volume=. 2009 , publisher=

  12. [13]

    Applied and computational harmonic analysis , volume=

    Iterative hard thresholding for compressed sensing , author=. Applied and computational harmonic analysis , volume=. 2009 , publisher=

  13. [14]

    , author=

    The PageRank citation ranking: Bringing order to the web. , author=. 1999 , institution=

  14. [15]

    Computer networks and ISDN systems , volume=

    The anatomy of a large-scale hypertextual web search engine , author=. Computer networks and ISDN systems , volume=. 1998 , publisher=

  15. [16]

    Machine learning , volume=

    Q-learning , author=. Machine learning , volume=. 1992 , publisher=

  16. [17]

    Machine learning , volume=

    Learning to predict by the methods of temporal differences , author=. Machine learning , volume=. 1988 , publisher=

  17. [18]

    IEEE Transactions on automatic control , volume=

    Consensus problems in networks of agents with switching topology and time-delays , author=. IEEE Transactions on automatic control , volume=. 2004 , publisher=

  18. [19]

    IEEE Transactions on automatic control , volume=

    Coordination of groups of mobile autonomous agents using nearest neighbor rules , author=. IEEE Transactions on automatic control , volume=. 2003 , publisher=

  19. [20]

    Journal of the royal statistical society: series B (methodological) , volume=

    Maximum likelihood from incomplete data via the EM algorithm , author=. Journal of the royal statistical society: series B (methodological) , volume=. 1977 , publisher=

  20. [21]

    SIAM Journal on Optimization , volume=

    Efficiency of coordinate descent methods on huge-scale optimization problems , author=. SIAM Journal on Optimization , volume=. 2012 , publisher=

  21. [22]

    Mathematical Programming , volume=

    Proximal alternating linearized minimization for nonconvex and nonsmooth problems , author=. Mathematical Programming , volume=. 2014 , publisher=

  22. [23]

    SIAM journal on imaging sciences , volume=

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems , author=. SIAM journal on imaging sciences , volume=. 2009 , publisher=

  23. [24]

    SIAM Journal on Optimization , volume=

    Incremental subgradient methods for nondifferentiable optimization , author=. SIAM Journal on Optimization , volume=. 2001 , publisher=

  24. [25]

    Stochastic Processes and their Applications , volume=

    Convergence and convergence rate of stochastic gradient search in the case of multiple and non-isolated extrema , author=. Stochastic Processes and their Applications , volume=. 2015 , publisher=

  25. [26]

    Mathematical Programming , volume=

    Optimization of Lipschitz continuous functions , author=. Mathematical Programming , volume=. 1977 , publisher=

  26. [27]

    arXiv preprint arXiv:2305.05828 , year=

    Convergence of a normal map-based prox-sgd method under the kl inequality , author=. arXiv preprint arXiv:2305.05828 , year=

  27. [28]

    International conference on machine learning , pages=

    Momentum improves normalized sgd , author=. International conference on machine learning , pages=. 2020 , organization=

  28. [29]

    2008 , publisher=

    Stochastic approximation: a dynamical systems viewpoint , author=. 2008 , publisher=

  29. [30]

    arXiv preprint arXiv:2405.16954 , year=

    Convergence of SGD with momentum in the nonconvex case: A time window-based analysis , author=. arXiv preprint arXiv:2405.16954 , year=

  30. [31]

    Bulletin of Mathematical Sciences , volume=

    \ Euclidean, metric, and Wasserstein \ gradient flows: an overview , author=. Bulletin of Mathematical Sciences , volume=. 2017 , publisher=

  31. [32]

    Cauchy, Augustin , journal=. M

  32. [33]

    Nonsmooth optimization , pages=

    Subgradient methods: a survey of Soviet research , author=. Nonsmooth optimization , pages=

  33. [34]

    2012 , publisher=

    Geometric theory of dynamical systems: an introduction , author=. 2012 , publisher=

  34. [35]

    1999 , publisher=

    Metric structures for Riemannian and non-Riemannian spaces , author=. 1999 , publisher=

  35. [36]

    2001 , publisher=

    Quasi-convex decomposition in o-minimal structures: application to the gradient conjecture , author=. 2001 , publisher=

  36. [37]

    Russian Mathematical Surveys , volume=

    Functions whose gradient is bounded by the reciprocal distance from the boundary of their domain , author=. Russian Mathematical Surveys , volume=. 1974 , publisher=

  37. [38]

    Journal of Mathematical Analysis and Applications , volume=

    Extension of Lipschitz functions , author=. Journal of Mathematical Analysis and Applications , volume=. 1980 , publisher=

  38. [39]

    Hassler Whitney Collected Papers , pages=

    Tangents to an analytic variety , author=. Hassler Whitney Collected Papers , pages=. 1992 , publisher=

  39. [40]

    Handbook of geometry and topology of singularities I , pages=

    Stratification theory , author=. Handbook of geometry and topology of singularities I , pages=. 2020 , publisher=

  40. [41]

    Illinois Journal of Mathematics , volume=

    John functions, quadratic integral forms and o-minimal structures , author=. Illinois Journal of Mathematics , volume=. 2002 , publisher=

  41. [42]

    Mathematica Scandinavica , pages=

    Quasihyperbolic geodesics in John domains , author=. Mathematica Scandinavica , pages=. 1989 , publisher=

  42. [43]

    Annales de l'institut Fourier , volume=

    Lipschitz properties of semi-analytic sets , author=. Annales de l'institut Fourier , volume=

  43. [44]

    Annales scientifiques de l'Ecole normale sup

    Lipschitz stratification of subanalytic sets , author=. Annales scientifiques de l'Ecole normale sup

  44. [45]

    1985 , publisher=

    Lipschitz equisingularity , author=. 1985 , publisher=

  45. [46]

    Illinois Journal of Mathematics , volume=

    Verdier and strict Thom stratifications in o-minimal structures , author=. Illinois Journal of Mathematics , volume=. 1998 , publisher=

  46. [47]

    Foundations of Computational Mathematics , pages=

    Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization , author=. Foundations of Computational Mathematics , pages=. 2025 , publisher=

  47. [48]

    Stratifications de Whitney et th

    Verdier, Jean-Louis , journal=. Stratifications de Whitney et th. 1976 , publisher=

  48. [49]

    Annales de l'institut Fourier , volume=

    Sur la g. Annales de l'institut Fourier , volume=

  49. [50]

    Proximal smoothness and the lower-C2 property , author=. J. Convex Anal , volume=

  50. [51]

    Real Algebraic Geometry: Proceedings of the Conference held in Rennes, France, June 24--28, 1991 , pages=

    On a subanalytic stratification satisfying a Whitney property with exponent 1 , author=. Real Algebraic Geometry: Proceedings of the Conference held in Rennes, France, June 24--28, 1991 , pages=. 2006 , organization=

  51. [52]

    2000 , publisher=

    An introduction to o-minimal geometry , author=. 2000 , publisher=

  52. [53]

    2007 , publisher=

    Fischer, Andreas , journal=. 2007 , publisher=

  53. [54]

    Annales Polonici Mathematici , volume=

    A decomposition of a set definable in an o-minimal structure into perfectly situated sets , author=. Annales Polonici Mathematici , volume=

  54. [55]

    Annales Scientifiques de l'

    Lipschitz stratifications in o-minimal structures , author=. Annales Scientifiques de l'

  55. [56]

    Annales de l'institut Fourier , volume=

    A linear extension operator for Whitney fields on closed o-minimal sets , author=. Annales de l'institut Fourier , volume=

  56. [57]

    Vietnam Journal of Mathematics , pages=

    Revisiting subgradient method: Complexity and convergence beyond Lipschitz continuity , author=. Vietnam Journal of Mathematics , pages=. 2024 , publisher=

  57. [58]

    Mathematical Programming , volume=

    On the projected subgradient method for nonsmooth convex optimization in a Hilbert space , author=. Mathematical Programming , volume=. 1998 , publisher=

  58. [59]

    Journal of Dynamical and Control Systems , volume=

    On Morse theory for piecewise smooth functions , author=. Journal of Dynamical and Control Systems , volume=. 1997 , publisher=

  59. [60]

    Computational Optimization and Applications , volume=

    A property of piecewise smooth functions , author=. Computational Optimization and Applications , volume=. 2003 , publisher=

  60. [61]

    Nonlinear Analysis: Theory, Methods & Applications , volume=

    On almost smooth functions and piecewise smooth functions , author=. Nonlinear Analysis: Theory, Methods & Applications , volume=. 2007 , publisher=

  61. [62]

    2023 , publisher=

    An introduction to optimization on smooth manifolds , author=. 2023 , publisher=

  62. [63]

    Mathematical Programming , volume=

    Stochastic algorithms with geometric step decay converge linearly on sharp functions , author=. Mathematical Programming , volume=. 2024 , publisher=

  63. [64]

    Journal of computational and Applied Mathematics , volume=

    Genetic algorithms for modelling and optimisation , author=. Journal of computational and Applied Mathematics , volume=. 2005 , publisher=

  64. [65]

    Linear Algebra and its Applications , volume=

    On the least squares distance between affine subspaces , author=. Linear Algebra and its Applications , volume=. 1996 , publisher=

  65. [66]

    Mathematical Programming , pages=

    No dimension-free deterministic algorithm computes approximate stationarities of lipschitzians , author=. Mathematical Programming , pages=. 2024 , publisher=

  66. [67]

    Advances in Neural Information Processing Systems , volume=

    Subquadratic overparameterization for shallow neural networks , author=. Advances in Neural Information Processing Systems , volume=

  67. [68]

    Applied and Computational Harmonic Analysis , volume=

    Loss landscapes and optimization in over-parameterized non-linear systems and neural networks , author=. Applied and Computational Harmonic Analysis , volume=. 2022 , publisher=

  68. [69]

    International Conference on Machine Learning , pages=

    On the proof of global convergence of gradient descent for deep relu networks with linear widths , author=. International Conference on Machine Learning , pages=. 2021 , organization=

  69. [70]

    Some NP-complete problems in quadratic and nonlinear programming , author=

  70. [71]

    Proceedings of the IEEE , volume=

    Gradient-based learning applied to document recognition , author=. Proceedings of the IEEE , volume=. 1998 , publisher=

  71. [72]

    arXiv preprint arXiv:1909.12292 , year=

    Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow relu networks , author=. arXiv preprint arXiv:1909.12292 , year=

  72. [73]

    Advances in neural information processing systems , volume=

    On the convergence rate of training recurrent neural networks , author=. Advances in neural information processing systems , volume=

  73. [74]

    IEEE Journal on Selected Areas in Information Theory , volume=

    Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks , author=. IEEE Journal on Selected Areas in Information Theory , volume=. 2020 , publisher=

  74. [75]

    Gradient Descent Provably Optimizes Over-parameterized Neural Networks

    Gradient descent provably optimizes over-parameterized neural networks , author=. arXiv preprint arXiv:1810.02054 , year=

  75. [76]

    International conference on machine learning , pages=

    Gradient descent finds global minima of deep neural networks , author=. International conference on machine learning , pages=. 2019 , organization=

  76. [77]

    Advances in neural information processing systems , volume=

    Neural tangent kernel: Convergence and generalization in neural networks , author=. Advances in neural information processing systems , volume=

  77. [78]

    Journal of Machine Learning Research , volume=

    Adam-family methods for nonsmooth optimization with convergence guarantees , author=. Journal of Machine Learning Research , volume=

  78. [79]

    Journal of mathematical analysis and applications , volume=

    General convergence results for stochastic approximations via weak convergence theory , author=. Journal of mathematical analysis and applications , volume=. 1977 , publisher=

  79. [80]

    International Conference on Algorithmic Learning Theory , pages=

    Provable Accelerated Convergence of Nesterov’s Momentum for Deep ReLU Neural Networks , author=. International Conference on Algorithmic Learning Theory , pages=. 2024 , organization=

  80. [81]

    SIAM Journal on Optimization , volume=

    Nonconvex robust low-rank matrix recovery , author=. SIAM Journal on Optimization , volume=. 2020 , publisher=

Showing first 80 references.