Adaptive Proximal Methods for Weakly Convex Optimization with Unknown Parameter: Deterministic and Stochastic Guarantees

Miaolan Xie

arxiv: 2606.17285 · v1 · pith:OEHJO6IYnew · submitted 2026-06-15 · 🧮 math.OC · cs.DS

Adaptive Proximal Methods for Weakly Convex Optimization with Unknown Parameter: Deterministic and Stochastic Guarantees

Miaolan Xie This is my paper

Pith reviewed 2026-06-27 02:06 UTC · model grok-4.3

classification 🧮 math.OC cs.DS

keywords weakly convex optimizationadaptive proximal methodsstochastic optimizationiteration complexityMoreau envelopesubgradient stationary pointproximal oracle

0 comments

The pith

Adaptive Prox-Guided Scheme minimizes ρ-weakly convex functions with unknown ρ at O(ε^{-2}) iteration complexity in deterministic and stochastic settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the Adaptive Prox-Guided Scheme (APS) for minimizing nonsmooth, nonconvex objectives that satisfy ρ-weak convexity when the parameter ρ is unknown to the algorithm. APS adjusts the proximal parameter online and in both directions by testing for sufficient descent, without assuming global Lipschitz continuity or smoothness of the objective. In the deterministic case the method produces an ε-subgradient stationary point after O(ε^{-2}) iterations. In the stochastic case the same order of iterations drives the gradient of the Moreau envelope below ε with high probability, even when function-difference estimates are biased and heavy-tailed and the stochastic proximal oracle is accurate with only constant probability when its parameter lies below 1/(2ρ).

Core claim

APS is a one-trial proximal algorithm that adapts its proximal parameter bidirectionally through a descent test; this adaptation yields an O(ε^{-2}) iteration bound for an ε-subgradient stationary point in the deterministic setting and a high-probability O(ε^{-2}) bound on the Moreau-envelope gradient in the stochastic setting, under the stated weak oracle assumptions that allow biased heavy-tailed estimates and limited-accuracy proximal oracles outside the regime where the parameter is below 1/(2ρ).

What carries the argument

Adaptive Prox-Guided Scheme (APS), which tunes the proximal parameter online via a descent test to exploit favorable local structure without prior knowledge of ρ.

If this is right

Optimization proceeds without any a priori bound on the weak-convexity parameter.
The same iteration order holds when function-value estimates are biased and heavy-tailed.
Only constant-probability accuracy of the proximal oracle is needed inside the regime parameter < 1/(2ρ).
The method applies to objectives that are neither globally Lipschitz nor smooth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bidirectional adaptation rule could be ported to other parameter-tuning problems such as unknown smoothness constants in smooth nonconvex optimization.
The high-probability analysis under minimal oracle accuracy may extend to variance-reduced or momentum-based variants of proximal methods.
The descent-test mechanism suggests a template for online parameter selection in composite problems with multiple unknown constants.

Load-bearing premise

The stochastic proximal oracle is required to be sufficiently accurate with constant probability precisely when the proximal parameter lies below 1/(2ρ) and may be arbitrary otherwise.

What would settle it

A concrete instance in which the proximal oracle satisfies the accuracy condition only for parameters above 1/(2ρ) yet APS still produces the claimed high-probability O(ε^{-2}) bound on the Moreau-envelope gradient.

Figures

Figures reproduced from arXiv: 2606.17285 by Miaolan Xie.

read the original abstract

Many nonsmooth, nonconvex objectives in learning and signal recovery are $\rho$-weakly convex. We minimize such a function in deterministic and stochastic settings when the weak-convexity parameter $\rho$ is unknown. The objective is not required to be globally Lipschitz continuous or smooth. We propose the Adaptive Prox-Guided Scheme (APS), a one-trial proximal algorithm that adapts the proximal parameter online and bidirectionally through a descent test, allowing it to exploit favorable local structure. In the deterministic setting, APS obtains an $O(\varepsilon^{-2})$ iteration complexity for producing an $\varepsilon$-subgradient stationary point. In the stochastic setting, APS achieves a high-probability $O(\varepsilon^{-2})$ iteration bound for driving the Moreau-envelope gradient below $\varepsilon$. This result holds under deliberately weak oracle assumptions: the function-difference estimates may be biased and heavy-tailed, and the stochastic proximal oracle need only be sufficiently accurate with constant probability when the proximal parameter lies below $1/(2\rho)$ (unknown to the algorithm), and can be arbitrary otherwise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

APS adapts the proximal parameter bidirectionally for unknown ρ and claims standard O(ε^{-2}) rates under unusually weak oracles, but the stochastic high-probability bound rests on a conditional accuracy assumption that looks fragile.

read the letter

The core idea is a one-trial proximal scheme that adjusts the parameter online in both directions via a descent test, so the algorithm does not need to know ρ ahead of time. That removes a common practical barrier in weakly convex problems from learning and signal processing.

The deterministic guarantee looks straightforward: O(ε^{-2}) iterations to an ε-subgradient stationary point. The stochastic version keeps the same rate in high probability while allowing biased, heavy-tailed function-difference estimates and a proximal oracle that only needs to be accurate with constant probability when the parameter drops below 1/(2ρ). The paper frames this conditional oracle model as weaker than prior assumptions.

The soft spot is exactly the conditional accuracy rule. Because ρ is unknown, the algorithm cannot directly check whether the current parameter satisfies the threshold, yet the high-probability argument uses that threshold to control failure probability across iterations. If the oracle's behavior does not line up with the parameter in the precise way stated, the bound does not go through even if the deterministic case survives. The abstract gives no numerical checks or proof sketches, so it is impossible to judge how tightly the analysis depends on that correlation.

The work is aimed at researchers who already work with proximal methods for nonsmooth nonconvex objectives and want to drop the known-ρ requirement. A reader looking for adaptive first-order schemes would find the descent-test mechanism worth examining.

It is worth sending to referees. The adaptation mechanism addresses a real gap, the complexity targets are standard, and the oracle weakening is explicit enough that reviewers can check whether the conditional assumption is realistic or merely convenient.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes the Adaptive Prox-Guided Scheme (APS), a proximal algorithm that adapts the proximal parameter bidirectionally via a descent test for minimizing ρ-weakly convex functions when ρ is unknown. It establishes an O(ε^{-2}) iteration complexity for an ε-subgradient stationary point in the deterministic setting and a high-probability O(ε^{-2}) bound on the Moreau-envelope gradient norm in the stochastic setting. The stochastic result holds under weak oracle assumptions allowing biased heavy-tailed function-difference estimates and a stochastic proximal oracle that is sufficiently accurate with constant probability only when the proximal parameter lies below 1/(2ρ) (and arbitrary otherwise).

Significance. If the proofs hold, the contribution is significant for practical nonsmooth nonconvex optimization where the weak-convexity parameter is typically unknown and stochastic oracles are noisy. The bidirectional adaptation and the ability to obtain high-probability guarantees under deliberately relaxed oracle conditions (including the conditional proximal-oracle accuracy) distinguish the work from prior adaptive proximal methods. The deterministic result is parameter-free in the stated sense and the stochastic high-probability bound is obtained without requiring global Lipschitzness or unbiased oracles.

minor comments (2)

The precise statement of the stochastic proximal-oracle assumption (tied to the threshold 1/(2ρ)) should be restated in the main theorem statement rather than only in the abstract and assumption section, to make the load-bearing hypothesis immediately visible to readers.
Notation for the Moreau envelope gradient and the subgradient stationarity measure should be unified across the deterministic and stochastic sections to avoid minor confusion when comparing the two complexity results.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. The referee summary accurately captures the APS algorithm, its bidirectional adaptation, the O(ε^{-2}) guarantees, and the deliberately weak oracle assumptions in both settings. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; bounds derived under external oracle assumptions

full rationale

The paper states explicit complexity results (O(ε^{-2}) deterministic and high-probability stochastic) under deliberately weak, externally stated oracle assumptions on function-difference estimates and conditional proximal-oracle accuracy. No equations, reductions, or self-citations are visible that equate the target bounds to fitted inputs, self-definitions, or prior author results by construction. The adaptation rule and high-probability control are presented as consequences of the stated assumptions rather than tautological with them.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only abstract available; ledger populated from stated assumptions in abstract. No free parameters or invented entities are visible.

axioms (2)

domain assumption Objective is ρ-weakly convex for some unknown ρ > 0
Central modeling assumption stated in first sentence of abstract.
domain assumption Stochastic proximal oracle is accurate with constant probability only when proximal parameter < 1/(2ρ)
Explicitly invoked to obtain the stochastic guarantee.

pith-pipeline@v0.9.1-grok · 5715 in / 1313 out tokens · 37333 ms · 2026-06-27T02:06:13.883876+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 4 canonical work pages

[1]

Alacaoglu, Y

A. Alacaoglu, Y. Malitsky, and V. Cevher. Convergence of adaptive algorithms for constrained weakly convex optimization. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 14214–14225, 2021. Also available asarXiv:2006.06650

arXiv 2021
[2]

K. Azuma. Weighted sums of certain dependent random variables.Tˆ ohoku Mathematical Journal, 19(3):357–367, 1967

1967
[3]

A. S. Berahas, M. Xie, and B. Zhou. A sequential quadratic programming method with high-probability complexity bounds for nonlinear equality-constrained stochastic optimization.SIAM Journal on Opti- mization, 35(1):240–269, 2025.doi:10.1137/23M1549006

work page doi:10.1137/23m1549006 2025
[4]

Blanchet, C

J. Blanchet, C. Cartis, M. Menickelly, and K. Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019

2019
[5]

B¨ ohm and S

A. B¨ ohm and S. J. Wright. Variable smoothing for weakly convex composite functions.Journal of Optimization Theory and Applications, 188:628–649, 2021

2021
[6]

E. J. Cand` es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis?Journal of the ACM, 58(3):Article 11, 2011

2011
[7]

L. Cao, A. S. Berahas, and K. Scheinberg. First- and second-order high probability complexity bounds for trust-region methods with noisy oracles.Mathematical Programming, 207(1–2):55–106, 2024. 23

2024
[8]

Cartis and K

C. Cartis and K. Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models.Mathematical Programming, 169(2):337–375, 2018

2018
[9]

Chandrasekaran, S

V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition.SIAM Journal on Optimization, 21(2):572–596, 2011

2011
[10]

Charisopoulos, Y

V. Charisopoulos, Y. Chen, D. Davis, M. D´ ıaz, L. Ding, and D. Drusvyatskiy. Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence.Foundations of Computational Mathematics, 21(6):1505–1593, 2021

2021
[11]

R. Chen, M. Menickelly, and K. Scheinberg. Stochastic optimization using a trust-region method and random models.Mathematical Programming, 169(2):447–487, 2018

2018
[12]

Davis and D

D. Davis and D. Drusvyatskiy. Stochastic model-based minimization of weakly convex functions.SIAM Journal on Optimization, 29(1):207–239, 2019

2019
[13]

Davis and B

D. Davis and B. Grimmer. Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems.SIAM Journal on Optimization, 29(3):1908–1930, 2019

1908
[14]

Deng and W

Q. Deng and W. Gao. A smooth approximation framework for weakly convex optimization. Preprint, arXiv:2512.09720, 2025

arXiv 2025
[15]

Drusvyatskiy

D. Drusvyatskiy. The proximal point method revisited.SIAG/OPT Views and News, 26(1):1–8, 2018. Also available asarXiv:1712.06038

Pith/arXiv arXiv 2018
[16]

Drusvyatskiy and C

D. Drusvyatskiy and C. Paquette. Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1–2):503–558, 2019

2019
[17]

J. C. Duchi and F. Ruan. Stochastic methods for composite and weakly convex optimization problems. SIAM Journal on Optimization, 28(4):3229–3259, 2018

2018
[18]

Y. C. Eldar and S. Mendelson. Phase retrieval: Stability and recovery guarantees.Applied and Com- putational Harmonic Analysis, 36(3):473–494, 2014

2014
[19]

X. Fan, I. Grama, and Q. Liu. Exponential inequalities for martingales with applications.Electronic Journal of Probability, 20:1–22, 2015

2015
[20]

D. Kh. Fuk and S. V. Nagaev. Probability inequalities for sums of independent random variables.Theory of Probability and Its Applications, 16(4):643–660, 1971

1971
[21]

Gao and Q

W. Gao and Q. Deng. Stochastic weakly convex optimization beyond Lipschitz continuity. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofPMLR, pages 14651– 14680, 2024. Also available asarXiv:2401.13971

arXiv 2024
[22]

Gratton, C

S. Gratton, C. W. Royer, L. N. Vicente, and Z. Zhang. Complexity and global rates of trust-region methods based on probabilistic models.IMA Journal of Numerical Analysis, 38(3):1579–1597, 2018

2018
[23]

B. Grimmer. Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity.SIAM Journal on Optimization, 29(2):1350–1365, 2019

2019
[24]

Hong and J

Y. Hong and J. Lin. High probability bounds on AdaGrad for constrained weakly convex optimization. Journal of Complexity, 86:101889, 2025

2025
[25]

B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for line search based on stochastic oracles. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 9193–9203, 2021. 24

2021
[26]

B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024.doi:10.1137/22M1512764

work page doi:10.1137/22m1512764 2024
[27]

B. Jin, K. Scheinberg, and M. Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, 209:651–679, 2025.doi:10.1007/s10107-024-02078-z

work page doi:10.1007/s10107-024-02078-z 2025
[28]

Lacoste-Julien, M

S. Lacoste-Julien, M. Schmidt, and F. Bach. A simpler approach to obtaining anO(1/t) convergence rate for the projected stochastic subgradient method. Preprint,arXiv:1212.2002, 2012

Pith/arXiv arXiv 2002
[29]

Liao and Y

F.-Y. Liao and Y. Zheng. A proximal descent method for minimizing weakly convex optimization. Preprint,arXiv:2509.02804, 2025

arXiv 2025
[30]

V. V. Mai and M. Johansson. Convergence of a stochastic gradient method with momentum for non- smooth non-convex optimization. InProceedings of the 37th International Conference on Machine Learning (ICML), volume 119 ofPMLR, pages 6630–6639, 2020

2020
[31]

Menickelly, S

M. Menickelly, S. M. Wild, and M. Xie. A stochastic quasi-Newton method in the absence of com- mon random numbers.Journal of Optimization Theory and Applications, 208(2):Article 84, 2026. doi:10.1007/s10957-025-02914-y

work page doi:10.1007/s10957-025-02914-y 2026
[32]

J.-J. Moreau. Proximit´ e et dualit´ e dans un espace hilbertien.Bulletin de la Soci´ et´ e Math´ ematique de France, 93:273–299, 1965

1965
[33]

Paquette, H

C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui. Catalyst for gradient-based non- convex optimization. InProceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), volume 84 ofPMLR, pages 613–622, 2018

2018
[34]

Paquette and K

C. Paquette and K. Scheinberg. A stochastic line search method with expected complexity analysis. SIAM Journal on Optimization, 30(1):349–376, 2020

2020
[35]

R. T. Rockafellar. Monotone operators and the proximal point algorithm.SIAM Journal on Control and Optimization, 14(5):877–898, 1976

1976
[36]

R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk.Journal of Risk, 2(3):21–41, 2000

2000
[37]

R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions.Journal of Banking & Finance, 26(7):1443–1471, 2002

2002
[38]

R. T. Rockafellar and R. J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der mathema- tischen Wissenschaften. Springer, 1998

1998
[39]

Scheinberg and M

K. Scheinberg and M. Xie. Stochastic adaptive optimization with unreliable inputs: a unified framework for high-probability complexity analysis. Preprint,arXiv:2511.19411, 2025

arXiv 2025
[40]

Scheinberg and M

K. Scheinberg and M. Xie. First- and second-order stochastic adaptive regularization with cubics: high probability iteration and sample complexity.INFORMS Journal on Optimization, 2026. To appear. Also available asarXiv:2308.13161. 25

arXiv 2026

[1] [1]

Alacaoglu, Y

A. Alacaoglu, Y. Malitsky, and V. Cevher. Convergence of adaptive algorithms for constrained weakly convex optimization. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 14214–14225, 2021. Also available asarXiv:2006.06650

arXiv 2021

[2] [2]

K. Azuma. Weighted sums of certain dependent random variables.Tˆ ohoku Mathematical Journal, 19(3):357–367, 1967

1967

[3] [3]

A. S. Berahas, M. Xie, and B. Zhou. A sequential quadratic programming method with high-probability complexity bounds for nonlinear equality-constrained stochastic optimization.SIAM Journal on Opti- mization, 35(1):240–269, 2025.doi:10.1137/23M1549006

work page doi:10.1137/23m1549006 2025

[4] [4]

Blanchet, C

J. Blanchet, C. Cartis, M. Menickelly, and K. Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019

2019

[5] [5]

B¨ ohm and S

A. B¨ ohm and S. J. Wright. Variable smoothing for weakly convex composite functions.Journal of Optimization Theory and Applications, 188:628–649, 2021

2021

[6] [6]

E. J. Cand` es, X. Li, Y. Ma, and J. Wright. Robust principal component analysis?Journal of the ACM, 58(3):Article 11, 2011

2011

[7] [7]

L. Cao, A. S. Berahas, and K. Scheinberg. First- and second-order high probability complexity bounds for trust-region methods with noisy oracles.Mathematical Programming, 207(1–2):55–106, 2024. 23

2024

[8] [8]

Cartis and K

C. Cartis and K. Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models.Mathematical Programming, 169(2):337–375, 2018

2018

[9] [9]

Chandrasekaran, S

V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition.SIAM Journal on Optimization, 21(2):572–596, 2011

2011

[10] [10]

Charisopoulos, Y

V. Charisopoulos, Y. Chen, D. Davis, M. D´ ıaz, L. Ding, and D. Drusvyatskiy. Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence.Foundations of Computational Mathematics, 21(6):1505–1593, 2021

2021

[11] [11]

R. Chen, M. Menickelly, and K. Scheinberg. Stochastic optimization using a trust-region method and random models.Mathematical Programming, 169(2):447–487, 2018

2018

[12] [12]

Davis and D

D. Davis and D. Drusvyatskiy. Stochastic model-based minimization of weakly convex functions.SIAM Journal on Optimization, 29(1):207–239, 2019

2019

[13] [13]

Davis and B

D. Davis and B. Grimmer. Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems.SIAM Journal on Optimization, 29(3):1908–1930, 2019

1908

[14] [14]

Deng and W

Q. Deng and W. Gao. A smooth approximation framework for weakly convex optimization. Preprint, arXiv:2512.09720, 2025

arXiv 2025

[15] [15]

Drusvyatskiy

D. Drusvyatskiy. The proximal point method revisited.SIAG/OPT Views and News, 26(1):1–8, 2018. Also available asarXiv:1712.06038

Pith/arXiv arXiv 2018

[16] [16]

Drusvyatskiy and C

D. Drusvyatskiy and C. Paquette. Efficiency of minimizing compositions of convex functions and smooth maps.Mathematical Programming, 178(1–2):503–558, 2019

2019

[17] [17]

J. C. Duchi and F. Ruan. Stochastic methods for composite and weakly convex optimization problems. SIAM Journal on Optimization, 28(4):3229–3259, 2018

2018

[18] [18]

Y. C. Eldar and S. Mendelson. Phase retrieval: Stability and recovery guarantees.Applied and Com- putational Harmonic Analysis, 36(3):473–494, 2014

2014

[19] [19]

X. Fan, I. Grama, and Q. Liu. Exponential inequalities for martingales with applications.Electronic Journal of Probability, 20:1–22, 2015

2015

[20] [20]

D. Kh. Fuk and S. V. Nagaev. Probability inequalities for sums of independent random variables.Theory of Probability and Its Applications, 16(4):643–660, 1971

1971

[21] [21]

Gao and Q

W. Gao and Q. Deng. Stochastic weakly convex optimization beyond Lipschitz continuity. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofPMLR, pages 14651– 14680, 2024. Also available asarXiv:2401.13971

arXiv 2024

[22] [22]

Gratton, C

S. Gratton, C. W. Royer, L. N. Vicente, and Z. Zhang. Complexity and global rates of trust-region methods based on probabilistic models.IMA Journal of Numerical Analysis, 38(3):1579–1597, 2018

2018

[23] [23]

B. Grimmer. Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity.SIAM Journal on Optimization, 29(2):1350–1365, 2019

2019

[24] [24]

Hong and J

Y. Hong and J. Lin. High probability bounds on AdaGrad for constrained weakly convex optimization. Journal of Complexity, 86:101889, 2025

2025

[25] [25]

B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for line search based on stochastic oracles. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 9193–9203, 2021. 24

2021

[26] [26]

B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024.doi:10.1137/22M1512764

work page doi:10.1137/22m1512764 2024

[27] [27]

B. Jin, K. Scheinberg, and M. Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, 209:651–679, 2025.doi:10.1007/s10107-024-02078-z

work page doi:10.1007/s10107-024-02078-z 2025

[28] [28]

Lacoste-Julien, M

S. Lacoste-Julien, M. Schmidt, and F. Bach. A simpler approach to obtaining anO(1/t) convergence rate for the projected stochastic subgradient method. Preprint,arXiv:1212.2002, 2012

Pith/arXiv arXiv 2002

[29] [29]

Liao and Y

F.-Y. Liao and Y. Zheng. A proximal descent method for minimizing weakly convex optimization. Preprint,arXiv:2509.02804, 2025

arXiv 2025

[30] [30]

V. V. Mai and M. Johansson. Convergence of a stochastic gradient method with momentum for non- smooth non-convex optimization. InProceedings of the 37th International Conference on Machine Learning (ICML), volume 119 ofPMLR, pages 6630–6639, 2020

2020

[31] [31]

Menickelly, S

M. Menickelly, S. M. Wild, and M. Xie. A stochastic quasi-Newton method in the absence of com- mon random numbers.Journal of Optimization Theory and Applications, 208(2):Article 84, 2026. doi:10.1007/s10957-025-02914-y

work page doi:10.1007/s10957-025-02914-y 2026

[32] [32]

J.-J. Moreau. Proximit´ e et dualit´ e dans un espace hilbertien.Bulletin de la Soci´ et´ e Math´ ematique de France, 93:273–299, 1965

1965

[33] [33]

Paquette, H

C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui. Catalyst for gradient-based non- convex optimization. InProceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), volume 84 ofPMLR, pages 613–622, 2018

2018

[34] [34]

Paquette and K

C. Paquette and K. Scheinberg. A stochastic line search method with expected complexity analysis. SIAM Journal on Optimization, 30(1):349–376, 2020

2020

[35] [35]

R. T. Rockafellar. Monotone operators and the proximal point algorithm.SIAM Journal on Control and Optimization, 14(5):877–898, 1976

1976

[36] [36]

R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk.Journal of Risk, 2(3):21–41, 2000

2000

[37] [37]

R. T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions.Journal of Banking & Finance, 26(7):1443–1471, 2002

2002

[38] [38]

R. T. Rockafellar and R. J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der mathema- tischen Wissenschaften. Springer, 1998

1998

[39] [39]

Scheinberg and M

K. Scheinberg and M. Xie. Stochastic adaptive optimization with unreliable inputs: a unified framework for high-probability complexity analysis. Preprint,arXiv:2511.19411, 2025

arXiv 2025

[40] [40]

Scheinberg and M

K. Scheinberg and M. Xie. First- and second-order stochastic adaptive regularization with cubics: high probability iteration and sample complexity.INFORMS Journal on Optimization, 2026. To appear. Also available asarXiv:2308.13161. 25

arXiv 2026