Adaptive Proximal Methods for Weakly Convex Optimization with Unknown Parameter: Deterministic and Stochastic Guarantees
Pith reviewed 2026-06-27 02:06 UTC · model grok-4.3
The pith
Adaptive Prox-Guided Scheme minimizes ρ-weakly convex functions with unknown ρ at O(ε^{-2}) iteration complexity in deterministic and stochastic settings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
APS is a one-trial proximal algorithm that adapts its proximal parameter bidirectionally through a descent test; this adaptation yields an O(ε^{-2}) iteration bound for an ε-subgradient stationary point in the deterministic setting and a high-probability O(ε^{-2}) bound on the Moreau-envelope gradient in the stochastic setting, under the stated weak oracle assumptions that allow biased heavy-tailed estimates and limited-accuracy proximal oracles outside the regime where the parameter is below 1/(2ρ).
What carries the argument
Adaptive Prox-Guided Scheme (APS), which tunes the proximal parameter online via a descent test to exploit favorable local structure without prior knowledge of ρ.
If this is right
- Optimization proceeds without any a priori bound on the weak-convexity parameter.
- The same iteration order holds when function-value estimates are biased and heavy-tailed.
- Only constant-probability accuracy of the proximal oracle is needed inside the regime parameter < 1/(2ρ).
- The method applies to objectives that are neither globally Lipschitz nor smooth.
Where Pith is reading between the lines
- The bidirectional adaptation rule could be ported to other parameter-tuning problems such as unknown smoothness constants in smooth nonconvex optimization.
- The high-probability analysis under minimal oracle accuracy may extend to variance-reduced or momentum-based variants of proximal methods.
- The descent-test mechanism suggests a template for online parameter selection in composite problems with multiple unknown constants.
Load-bearing premise
The stochastic proximal oracle is required to be sufficiently accurate with constant probability precisely when the proximal parameter lies below 1/(2ρ) and may be arbitrary otherwise.
What would settle it
A concrete instance in which the proximal oracle satisfies the accuracy condition only for parameters above 1/(2ρ) yet APS still produces the claimed high-probability O(ε^{-2}) bound on the Moreau-envelope gradient.
Figures
read the original abstract
Many nonsmooth, nonconvex objectives in learning and signal recovery are $\rho$-weakly convex. We minimize such a function in deterministic and stochastic settings when the weak-convexity parameter $\rho$ is unknown. The objective is not required to be globally Lipschitz continuous or smooth. We propose the Adaptive Prox-Guided Scheme (APS), a one-trial proximal algorithm that adapts the proximal parameter online and bidirectionally through a descent test, allowing it to exploit favorable local structure. In the deterministic setting, APS obtains an $O(\varepsilon^{-2})$ iteration complexity for producing an $\varepsilon$-subgradient stationary point. In the stochastic setting, APS achieves a high-probability $O(\varepsilon^{-2})$ iteration bound for driving the Moreau-envelope gradient below $\varepsilon$. This result holds under deliberately weak oracle assumptions: the function-difference estimates may be biased and heavy-tailed, and the stochastic proximal oracle need only be sufficiently accurate with constant probability when the proximal parameter lies below $1/(2\rho)$ (unknown to the algorithm), and can be arbitrary otherwise.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Adaptive Prox-Guided Scheme (APS), a proximal algorithm that adapts the proximal parameter bidirectionally via a descent test for minimizing ρ-weakly convex functions when ρ is unknown. It establishes an O(ε^{-2}) iteration complexity for an ε-subgradient stationary point in the deterministic setting and a high-probability O(ε^{-2}) bound on the Moreau-envelope gradient norm in the stochastic setting. The stochastic result holds under weak oracle assumptions allowing biased heavy-tailed function-difference estimates and a stochastic proximal oracle that is sufficiently accurate with constant probability only when the proximal parameter lies below 1/(2ρ) (and arbitrary otherwise).
Significance. If the proofs hold, the contribution is significant for practical nonsmooth nonconvex optimization where the weak-convexity parameter is typically unknown and stochastic oracles are noisy. The bidirectional adaptation and the ability to obtain high-probability guarantees under deliberately relaxed oracle conditions (including the conditional proximal-oracle accuracy) distinguish the work from prior adaptive proximal methods. The deterministic result is parameter-free in the stated sense and the stochastic high-probability bound is obtained without requiring global Lipschitzness or unbiased oracles.
minor comments (2)
- The precise statement of the stochastic proximal-oracle assumption (tied to the threshold 1/(2ρ)) should be restated in the main theorem statement rather than only in the abstract and assumption section, to make the load-bearing hypothesis immediately visible to readers.
- Notation for the Moreau envelope gradient and the subgradient stationarity measure should be unified across the deterministic and stochastic sections to avoid minor confusion when comparing the two complexity results.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and recommendation of minor revision. The referee summary accurately captures the APS algorithm, its bidirectional adaptation, the O(ε^{-2}) guarantees, and the deliberately weak oracle assumptions in both settings. No major comments were provided in the report.
Circularity Check
No significant circularity; bounds derived under external oracle assumptions
full rationale
The paper states explicit complexity results (O(ε^{-2}) deterministic and high-probability stochastic) under deliberately weak, externally stated oracle assumptions on function-difference estimates and conditional proximal-oracle accuracy. No equations, reductions, or self-citations are visible that equate the target bounds to fitted inputs, self-definitions, or prior author results by construction. The adaptation rule and high-probability control are presented as consequences of the stated assumptions rather than tautological with them.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Objective is ρ-weakly convex for some unknown ρ > 0
- domain assumption Stochastic proximal oracle is accurate with constant probability only when proximal parameter < 1/(2ρ)
Reference graph
Works this paper leans on
-
[1]
A. Alacaoglu, Y. Malitsky, and V. Cevher. Convergence of adaptive algorithms for constrained weakly convex optimization. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 14214–14225, 2021. Also available asarXiv:2006.06650
arXiv 2021
-
[2]
K. Azuma. Weighted sums of certain dependent random variables.Tˆ ohoku Mathematical Journal, 19(3):357–367, 1967
1967
-
[3]
A. S. Berahas, M. Xie, and B. Zhou. A sequential quadratic programming method with high-probability complexity bounds for nonlinear equality-constrained stochastic optimization.SIAM Journal on Opti- mization, 35(1):240–269, 2025.doi:10.1137/23M1549006
-
[4]
Blanchet, C
J. Blanchet, C. Cartis, M. Menickelly, and K. Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019
2019
-
[5]
B¨ ohm and S
A. B¨ ohm and S. J. Wright. Variable smoothing for weakly convex composite functions.Journal of Optimization Theory and Applications, 188:628–649, 2021
2021
-
[6]
E. J. Cand` es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis?Journal of the ACM, 58(3):Article 11, 2011
2011
-
[7]
L. Cao, A. S. Berahas, and K. Scheinberg. First- and second-order high probability complexity bounds for trust-region methods with noisy oracles.Mathematical Programming, 207(1–2):55–106, 2024. 23
2024
-
[8]
Cartis and K
C. Cartis and K. Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models.Mathematical Programming, 169(2):337–375, 2018
2018
-
[9]
Chandrasekaran, S
V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition.SIAM Journal on Optimization, 21(2):572–596, 2011
2011
-
[10]
Charisopoulos, Y
V. Charisopoulos, Y. Chen, D. Davis, M. D´ ıaz, L. Ding, and D. Drusvyatskiy. Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence.Foundations of Computational Mathematics, 21(6):1505–1593, 2021
2021
-
[11]
R. Chen, M. Menickelly, and K. Scheinberg. Stochastic optimization using a trust-region method and random models.Mathematical Programming, 169(2):447–487, 2018
2018
-
[12]
Davis and D
D. Davis and D. Drusvyatskiy. Stochastic model-based minimization of weakly convex functions.SIAM Journal on Optimization, 29(1):207–239, 2019
2019
-
[13]
Davis and B
D. Davis and B. Grimmer. Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems.SIAM Journal on Optimization, 29(3):1908–1930, 2019
1908
-
[14]
Q. Deng and W. Gao. A smooth approximation framework for weakly convex optimization. Preprint, arXiv:2512.09720, 2025
arXiv 2025
-
[15]
D. Drusvyatskiy. The proximal point method revisited.SIAG/OPT Views and News, 26(1):1–8, 2018. Also available asarXiv:1712.06038
Pith/arXiv arXiv 2018
-
[16]
Drusvyatskiy and C
D. Drusvyatskiy and C. Paquette. Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1–2):503–558, 2019
2019
-
[17]
J. C. Duchi and F. Ruan. Stochastic methods for composite and weakly convex optimization problems. SIAM Journal on Optimization, 28(4):3229–3259, 2018
2018
-
[18]
Y. C. Eldar and S. Mendelson. Phase retrieval: Stability and recovery guarantees.Applied and Com- putational Harmonic Analysis, 36(3):473–494, 2014
2014
-
[19]
X. Fan, I. Grama, and Q. Liu. Exponential inequalities for martingales with applications.Electronic Journal of Probability, 20:1–22, 2015
2015
-
[20]
D. Kh. Fuk and S. V. Nagaev. Probability inequalities for sums of independent random variables.Theory of Probability and Its Applications, 16(4):643–660, 1971
1971
- [21]
-
[22]
Gratton, C
S. Gratton, C. W. Royer, L. N. Vicente, and Z. Zhang. Complexity and global rates of trust-region methods based on probabilistic models.IMA Journal of Numerical Analysis, 38(3):1579–1597, 2018
2018
-
[23]
B. Grimmer. Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity.SIAM Journal on Optimization, 29(2):1350–1365, 2019
2019
-
[24]
Hong and J
Y. Hong and J. Lin. High probability bounds on AdaGrad for constrained weakly convex optimization. Journal of Complexity, 86:101889, 2025
2025
-
[25]
B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for line search based on stochastic oracles. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 9193–9203, 2021. 24
2021
-
[26]
B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024.doi:10.1137/22M1512764
-
[27]
B. Jin, K. Scheinberg, and M. Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, 209:651–679, 2025.doi:10.1007/s10107-024-02078-z
-
[28]
S. Lacoste-Julien, M. Schmidt, and F. Bach. A simpler approach to obtaining anO(1/t) convergence rate for the projected stochastic subgradient method. Preprint,arXiv:1212.2002, 2012
Pith/arXiv arXiv 2002
-
[29]
F.-Y. Liao and Y. Zheng. A proximal descent method for minimizing weakly convex optimization. Preprint,arXiv:2509.02804, 2025
arXiv 2025
-
[30]
V. V. Mai and M. Johansson. Convergence of a stochastic gradient method with momentum for non- smooth non-convex optimization. InProceedings of the 37th International Conference on Machine Learning (ICML), volume 119 ofPMLR, pages 6630–6639, 2020
2020
-
[31]
M. Menickelly, S. M. Wild, and M. Xie. A stochastic quasi-Newton method in the absence of com- mon random numbers.Journal of Optimization Theory and Applications, 208(2):Article 84, 2026. doi:10.1007/s10957-025-02914-y
-
[32]
J.-J. Moreau. Proximit´ e et dualit´ e dans un espace hilbertien.Bulletin de la Soci´ et´ e Math´ ematique de France, 93:273–299, 1965
1965
-
[33]
Paquette, H
C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui. Catalyst for gradient-based non- convex optimization. InProceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), volume 84 ofPMLR, pages 613–622, 2018
2018
-
[34]
Paquette and K
C. Paquette and K. Scheinberg. A stochastic line search method with expected complexity analysis. SIAM Journal on Optimization, 30(1):349–376, 2020
2020
-
[35]
R. T. Rockafellar. Monotone operators and the proximal point algorithm.SIAM Journal on Control and Optimization, 14(5):877–898, 1976
1976
-
[36]
R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk.Journal of Risk, 2(3):21–41, 2000
2000
-
[37]
R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions.Journal of Banking & Finance, 26(7):1443–1471, 2002
2002
-
[38]
R. T. Rockafellar and R. J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der mathema- tischen Wissenschaften. Springer, 1998
1998
-
[39]
K. Scheinberg and M. Xie. Stochastic adaptive optimization with unreliable inputs: a unified framework for high-probability complexity analysis. Preprint,arXiv:2511.19411, 2025
arXiv 2025
-
[40]
K. Scheinberg and M. Xie. First- and second-order stochastic adaptive regularization with cubics: high probability iteration and sample complexity.INFORMS Journal on Optimization, 2026. To appear. Also available asarXiv:2308.13161. 25
arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.