pith. sign in

arxiv: 2606.17285 · v1 · pith:OEHJO6IYnew · submitted 2026-06-15 · 🧮 math.OC · cs.DS

Adaptive Proximal Methods for Weakly Convex Optimization with Unknown Parameter: Deterministic and Stochastic Guarantees

Pith reviewed 2026-06-27 02:06 UTC · model grok-4.3

classification 🧮 math.OC cs.DS
keywords weakly convex optimizationadaptive proximal methodsstochastic optimizationiteration complexityMoreau envelopesubgradient stationary pointproximal oracle
0
0 comments X

The pith

Adaptive Prox-Guided Scheme minimizes ρ-weakly convex functions with unknown ρ at O(ε^{-2}) iteration complexity in deterministic and stochastic settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the Adaptive Prox-Guided Scheme (APS) for minimizing nonsmooth, nonconvex objectives that satisfy ρ-weak convexity when the parameter ρ is unknown to the algorithm. APS adjusts the proximal parameter online and in both directions by testing for sufficient descent, without assuming global Lipschitz continuity or smoothness of the objective. In the deterministic case the method produces an ε-subgradient stationary point after O(ε^{-2}) iterations. In the stochastic case the same order of iterations drives the gradient of the Moreau envelope below ε with high probability, even when function-difference estimates are biased and heavy-tailed and the stochastic proximal oracle is accurate with only constant probability when its parameter lies below 1/(2ρ).

Core claim

APS is a one-trial proximal algorithm that adapts its proximal parameter bidirectionally through a descent test; this adaptation yields an O(ε^{-2}) iteration bound for an ε-subgradient stationary point in the deterministic setting and a high-probability O(ε^{-2}) bound on the Moreau-envelope gradient in the stochastic setting, under the stated weak oracle assumptions that allow biased heavy-tailed estimates and limited-accuracy proximal oracles outside the regime where the parameter is below 1/(2ρ).

What carries the argument

Adaptive Prox-Guided Scheme (APS), which tunes the proximal parameter online via a descent test to exploit favorable local structure without prior knowledge of ρ.

If this is right

  • Optimization proceeds without any a priori bound on the weak-convexity parameter.
  • The same iteration order holds when function-value estimates are biased and heavy-tailed.
  • Only constant-probability accuracy of the proximal oracle is needed inside the regime parameter < 1/(2ρ).
  • The method applies to objectives that are neither globally Lipschitz nor smooth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bidirectional adaptation rule could be ported to other parameter-tuning problems such as unknown smoothness constants in smooth nonconvex optimization.
  • The high-probability analysis under minimal oracle accuracy may extend to variance-reduced or momentum-based variants of proximal methods.
  • The descent-test mechanism suggests a template for online parameter selection in composite problems with multiple unknown constants.

Load-bearing premise

The stochastic proximal oracle is required to be sufficiently accurate with constant probability precisely when the proximal parameter lies below 1/(2ρ) and may be arbitrary otherwise.

What would settle it

A concrete instance in which the proximal oracle satisfies the accuracy condition only for parameters above 1/(2ρ) yet APS still produces the claimed high-probability O(ε^{-2}) bound on the Moreau-envelope gradient.

Figures

Figures reproduced from arXiv: 2606.17285 by Miaolan Xie.

Figure 1
Figure 1. Figure 1: The two-regime difficulty. The proximal subproblem behaves qualitatively differently across the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

Many nonsmooth, nonconvex objectives in learning and signal recovery are $\rho$-weakly convex. We minimize such a function in deterministic and stochastic settings when the weak-convexity parameter $\rho$ is unknown. The objective is not required to be globally Lipschitz continuous or smooth. We propose the Adaptive Prox-Guided Scheme (APS), a one-trial proximal algorithm that adapts the proximal parameter online and bidirectionally through a descent test, allowing it to exploit favorable local structure. In the deterministic setting, APS obtains an $O(\varepsilon^{-2})$ iteration complexity for producing an $\varepsilon$-subgradient stationary point. In the stochastic setting, APS achieves a high-probability $O(\varepsilon^{-2})$ iteration bound for driving the Moreau-envelope gradient below $\varepsilon$. This result holds under deliberately weak oracle assumptions: the function-difference estimates may be biased and heavy-tailed, and the stochastic proximal oracle need only be sufficiently accurate with constant probability when the proximal parameter lies below $1/(2\rho)$ (unknown to the algorithm), and can be arbitrary otherwise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes the Adaptive Prox-Guided Scheme (APS), a proximal algorithm that adapts the proximal parameter bidirectionally via a descent test for minimizing ρ-weakly convex functions when ρ is unknown. It establishes an O(ε^{-2}) iteration complexity for an ε-subgradient stationary point in the deterministic setting and a high-probability O(ε^{-2}) bound on the Moreau-envelope gradient norm in the stochastic setting. The stochastic result holds under weak oracle assumptions allowing biased heavy-tailed function-difference estimates and a stochastic proximal oracle that is sufficiently accurate with constant probability only when the proximal parameter lies below 1/(2ρ) (and arbitrary otherwise).

Significance. If the proofs hold, the contribution is significant for practical nonsmooth nonconvex optimization where the weak-convexity parameter is typically unknown and stochastic oracles are noisy. The bidirectional adaptation and the ability to obtain high-probability guarantees under deliberately relaxed oracle conditions (including the conditional proximal-oracle accuracy) distinguish the work from prior adaptive proximal methods. The deterministic result is parameter-free in the stated sense and the stochastic high-probability bound is obtained without requiring global Lipschitzness or unbiased oracles.

minor comments (2)
  1. The precise statement of the stochastic proximal-oracle assumption (tied to the threshold 1/(2ρ)) should be restated in the main theorem statement rather than only in the abstract and assumption section, to make the load-bearing hypothesis immediately visible to readers.
  2. Notation for the Moreau envelope gradient and the subgradient stationarity measure should be unified across the deterministic and stochastic sections to avoid minor confusion when comparing the two complexity results.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. The referee summary accurately captures the APS algorithm, its bidirectional adaptation, the O(ε^{-2}) guarantees, and the deliberately weak oracle assumptions in both settings. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; bounds derived under external oracle assumptions

full rationale

The paper states explicit complexity results (O(ε^{-2}) deterministic and high-probability stochastic) under deliberately weak, externally stated oracle assumptions on function-difference estimates and conditional proximal-oracle accuracy. No equations, reductions, or self-citations are visible that equate the target bounds to fitted inputs, self-definitions, or prior author results by construction. The adaptation rule and high-probability control are presented as consequences of the stated assumptions rather than tautological with them.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only abstract available; ledger populated from stated assumptions in abstract. No free parameters or invented entities are visible.

axioms (2)
  • domain assumption Objective is ρ-weakly convex for some unknown ρ > 0
    Central modeling assumption stated in first sentence of abstract.
  • domain assumption Stochastic proximal oracle is accurate with constant probability only when proximal parameter < 1/(2ρ)
    Explicitly invoked to obtain the stochastic guarantee.

pith-pipeline@v0.9.1-grok · 5715 in / 1313 out tokens · 37333 ms · 2026-06-27T02:06:13.883876+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 4 canonical work pages

  1. [1]

    Alacaoglu, Y

    A. Alacaoglu, Y. Malitsky, and V. Cevher. Convergence of adaptive algorithms for constrained weakly convex optimization. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 14214–14225, 2021. Also available asarXiv:2006.06650

  2. [2]

    K. Azuma. Weighted sums of certain dependent random variables.Tˆ ohoku Mathematical Journal, 19(3):357–367, 1967

  3. [3]

    A. S. Berahas, M. Xie, and B. Zhou. A sequential quadratic programming method with high-probability complexity bounds for nonlinear equality-constrained stochastic optimization.SIAM Journal on Opti- mization, 35(1):240–269, 2025.doi:10.1137/23M1549006

  4. [4]

    Blanchet, C

    J. Blanchet, C. Cartis, M. Menickelly, and K. Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019

  5. [5]

    B¨ ohm and S

    A. B¨ ohm and S. J. Wright. Variable smoothing for weakly convex composite functions.Journal of Optimization Theory and Applications, 188:628–649, 2021

  6. [6]

    E. J. Cand` es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis?Journal of the ACM, 58(3):Article 11, 2011

  7. [7]

    L. Cao, A. S. Berahas, and K. Scheinberg. First- and second-order high probability complexity bounds for trust-region methods with noisy oracles.Mathematical Programming, 207(1–2):55–106, 2024. 23

  8. [8]

    Cartis and K

    C. Cartis and K. Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models.Mathematical Programming, 169(2):337–375, 2018

  9. [9]

    Chandrasekaran, S

    V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition.SIAM Journal on Optimization, 21(2):572–596, 2011

  10. [10]

    Charisopoulos, Y

    V. Charisopoulos, Y. Chen, D. Davis, M. D´ ıaz, L. Ding, and D. Drusvyatskiy. Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence.Foundations of Computational Mathematics, 21(6):1505–1593, 2021

  11. [11]

    R. Chen, M. Menickelly, and K. Scheinberg. Stochastic optimization using a trust-region method and random models.Mathematical Programming, 169(2):447–487, 2018

  12. [12]

    Davis and D

    D. Davis and D. Drusvyatskiy. Stochastic model-based minimization of weakly convex functions.SIAM Journal on Optimization, 29(1):207–239, 2019

  13. [13]

    Davis and B

    D. Davis and B. Grimmer. Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems.SIAM Journal on Optimization, 29(3):1908–1930, 2019

  14. [14]

    Deng and W

    Q. Deng and W. Gao. A smooth approximation framework for weakly convex optimization. Preprint, arXiv:2512.09720, 2025

  15. [15]

    Drusvyatskiy

    D. Drusvyatskiy. The proximal point method revisited.SIAG/OPT Views and News, 26(1):1–8, 2018. Also available asarXiv:1712.06038

  16. [16]

    Drusvyatskiy and C

    D. Drusvyatskiy and C. Paquette. Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1–2):503–558, 2019

  17. [17]

    J. C. Duchi and F. Ruan. Stochastic methods for composite and weakly convex optimization problems. SIAM Journal on Optimization, 28(4):3229–3259, 2018

  18. [18]

    Y. C. Eldar and S. Mendelson. Phase retrieval: Stability and recovery guarantees.Applied and Com- putational Harmonic Analysis, 36(3):473–494, 2014

  19. [19]

    X. Fan, I. Grama, and Q. Liu. Exponential inequalities for martingales with applications.Electronic Journal of Probability, 20:1–22, 2015

  20. [20]

    D. Kh. Fuk and S. V. Nagaev. Probability inequalities for sums of independent random variables.Theory of Probability and Its Applications, 16(4):643–660, 1971

  21. [21]

    Gao and Q

    W. Gao and Q. Deng. Stochastic weakly convex optimization beyond Lipschitz continuity. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofPMLR, pages 14651– 14680, 2024. Also available asarXiv:2401.13971

  22. [22]

    Gratton, C

    S. Gratton, C. W. Royer, L. N. Vicente, and Z. Zhang. Complexity and global rates of trust-region methods based on probabilistic models.IMA Journal of Numerical Analysis, 38(3):1579–1597, 2018

  23. [23]

    B. Grimmer. Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity.SIAM Journal on Optimization, 29(2):1350–1365, 2019

  24. [24]

    Hong and J

    Y. Hong and J. Lin. High probability bounds on AdaGrad for constrained weakly convex optimization. Journal of Complexity, 86:101889, 2025

  25. [25]

    B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for line search based on stochastic oracles. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 9193–9203, 2021. 24

  26. [26]

    B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024.doi:10.1137/22M1512764

  27. [27]

    B. Jin, K. Scheinberg, and M. Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, 209:651–679, 2025.doi:10.1007/s10107-024-02078-z

  28. [28]

    Lacoste-Julien, M

    S. Lacoste-Julien, M. Schmidt, and F. Bach. A simpler approach to obtaining anO(1/t) convergence rate for the projected stochastic subgradient method. Preprint,arXiv:1212.2002, 2012

  29. [29]

    Liao and Y

    F.-Y. Liao and Y. Zheng. A proximal descent method for minimizing weakly convex optimization. Preprint,arXiv:2509.02804, 2025

  30. [30]

    V. V. Mai and M. Johansson. Convergence of a stochastic gradient method with momentum for non- smooth non-convex optimization. InProceedings of the 37th International Conference on Machine Learning (ICML), volume 119 ofPMLR, pages 6630–6639, 2020

  31. [31]

    Menickelly, S

    M. Menickelly, S. M. Wild, and M. Xie. A stochastic quasi-Newton method in the absence of com- mon random numbers.Journal of Optimization Theory and Applications, 208(2):Article 84, 2026. doi:10.1007/s10957-025-02914-y

  32. [32]

    J.-J. Moreau. Proximit´ e et dualit´ e dans un espace hilbertien.Bulletin de la Soci´ et´ e Math´ ematique de France, 93:273–299, 1965

  33. [33]

    Paquette, H

    C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui. Catalyst for gradient-based non- convex optimization. InProceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), volume 84 ofPMLR, pages 613–622, 2018

  34. [34]

    Paquette and K

    C. Paquette and K. Scheinberg. A stochastic line search method with expected complexity analysis. SIAM Journal on Optimization, 30(1):349–376, 2020

  35. [35]

    R. T. Rockafellar. Monotone operators and the proximal point algorithm.SIAM Journal on Control and Optimization, 14(5):877–898, 1976

  36. [36]

    R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk.Journal of Risk, 2(3):21–41, 2000

  37. [37]

    R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions.Journal of Banking & Finance, 26(7):1443–1471, 2002

  38. [38]

    R. T. Rockafellar and R. J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der mathema- tischen Wissenschaften. Springer, 1998

  39. [39]

    Scheinberg and M

    K. Scheinberg and M. Xie. Stochastic adaptive optimization with unreliable inputs: a unified framework for high-probability complexity analysis. Preprint,arXiv:2511.19411, 2025

  40. [40]

    Scheinberg and M

    K. Scheinberg and M. Xie. First- and second-order stochastic adaptive regularization with cubics: high probability iteration and sample complexity.INFORMS Journal on Optimization, 2026. To appear. Also available asarXiv:2308.13161. 25