Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization

Guanghui Lan; Qi Deng; Zhenwei Lin

arxiv: 2412.06319 · v3 · submitted 2024-12-09 · 🧮 math.OC

Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization

Qi Deng , Guanghui Lan , Zhenwei Lin This is my paper

Pith reviewed 2026-05-23 07:39 UTC · model grok-4.3

classification 🧮 math.OC

keywords first-order methodsconvex optimizationfunctional constraintsparameter-freeoptimal oracle complexityHölder smoothnessaccelerated methodslevel-set methods

0 comments

The pith

A parameter-free first-order method achieves the optimal oracle complexity for convex optimization with functional constraints under Hölder smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops first-order methods for convex optimization problems that include convex functional constraints. These methods are designed to be parameter-free, meaning they do not require knowledge of smoothness levels or Hölder continuity parameters, and they avoid line searches. For problems where the optimal objective value is known, an accelerated method with a modified Polyak step size and Nesterov's momentum is proposed. When the optimal value is unknown, the problem is reformulated as a root-finding task solved using the accelerated prox-level method. In both cases, the methods attain the optimal rate of O(ε^{-2/(1+3ρ)}) under Hölder smoothness assumptions using only function and gradient evaluations.

Core claim

The central claim is that first-order methods can be made uniformly optimal and parameter-free for convex optimization with convex functional constraints by enabling oracles for diagonal quadratic programs in subproblems; when the optimal value f* is known the method uses a modified Polyak stepsize with Nesterov momentum, and when f* is unknown the problem is recast as root-finding solved via inexact fixed-point iteration and the accelerated prox-level method, both attaining the bound O(ε^{-2/(1+3ρ)}) without knowledge of smoothness or Hölder parameters.

What carries the argument

The accelerated prox-level (APL) method used inside level-set reformulations when f* is unknown, together with the modified Polyak stepsize and Nesterov momentum when f* is known.

If this is right

The methods apply directly to feasibility problems and overparameterized models where f* is known.
Root-finding subproblems are solved to relative accuracy by first-order methods without losing the overall optimal rate.
The same complexity bound holds for convex function-constrained optimization as for problems with simple constraints.
No line search or prior structural knowledge is needed to reach the Hölder-optimal oracle complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Implementation in machine learning pipelines could become simpler because no separate estimation of smoothness constants is required.
The root-finding reduction suggests that similar inexact fixed-point schemes might produce parameter-free variants for other classes of constrained problems.
The approach separates the outer root search from the inner first-order solver, which could allow modular replacement of the inner solver while preserving optimality.

Load-bearing premise

The subproblems can be solved using computational oracles for diagonal quadratic programs and inexact first-order methods to a specified relative accuracy.

What would settle it

A concrete counter-example or numerical test on a Hölder-smooth function-constrained problem in which the observed number of gradient evaluations exceeds the stated O(ε^{-2/(1+3ρ)}) bound once all smoothness and Hölder parameters are hidden from the algorithm.

Figures

Figures reproduced from arXiv: 2412.06319 by Guanghui Lan, Qi Deng, Zhenwei Lin.

**Figure 1.** Figure 1: Objective value (dK(uk) + dK(sk)) v.s. number of gradient evaluations in the SOCP problem. Left is 500 variables, 200 equality constraints and 10 cones each of dimension 50, right is 1000 variables, 800 equality constraints and 10 cones each of dimension 100 5.1 SOCPs In this section, we solve standard SOCP, which can be equivalently expressed as the following penalty problem [13]: min x dK(u) + dK∗ (s) s.… view at source ↗

**Figure 2.** Figure 2: Objective value (∥[σmax(I−Xk)]+∥+∥[σmax(A⊤ i Xk+XkA⊤ i )]+∥) v.s. number of gradient evaluations in LMI. Left is LMI problem with parameter (q, k) = (20, 10) and right is LMI problem with parameter (q, k) = (40, 20). 5.2 LMIs The second experiment focuses on a Linear Matrix Inequality (LMI) problem, which seeks a symmetric matrix X ∈ S q×q such that the following inequalities hold: X ⪰ I, A⊤ i X + XAi ⪯ 0,… view at source ↗

**Figure 3.** Figure 3: Number of gradient evaluations v.s. different [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

**Figure 4.** Figure 4: The y-axis represents the value of max{|f(xk) − f ∗ |/|f ∗ |, [gi(xk)]+} and the x-axis indicates the number of gradient evaluations needed to satisfy the convergence criterion ut ≤ 10−3 . All experiments were conducted with the parameters β = 1, α = 1.365, and ν = 0.9. The graph on the left displays results for a convex QCQP with m = 10 and n = 1000, while the graph on the right shows results for a convex… view at source ↗

**Figure 5.** Figure 5: The y-axis represents the value of max{|f(xk) − f ∗ |/|f ∗ |, [gi(xk)]+} and the x-axis indicates the number of gradient evaluations or iterations needed to satisfy the convergence criteria ut ≤ 10−3 for our two algorithms ( gradient evaluations in the first graph) and complementary condition smaller than 10−3 for iALM (outer iterations in the second graph and number of gradient evaluations in the last gra… view at source ↗

**Figure 6.** Figure 6: The y-axis represents the value of max{|f(xk) − f ∗ |/|f ∗ |, [gi(xk)]+} and the x-axis indicates the number of gradient evaluations needed to satisfy the convergence criteria ut ≤ 10−3 for our two algorithms and complementary condition smaller than 10−3 for iALM. IFP: APL-based Inexact Fixed Point Method; TIS: APL-based Truncated Inexact Secant Method. References [1] M. ApS. Mosek optimization toolbox for… view at source ↗

read the original abstract

This paper presents new first-order methods for achieving optimal oracle complexities in convex optimization with convex functional constraints. Oracle complexities are measured by the number of function and gradient evaluations. To achieve this, we enable first-order methods to utilize computational oracles for solving diagonal quadratic programs in subproblems. For problems where the optimal value $f^*$ is known, such as those in overparameterized models and feasibility problems, we propose an accelerated first-order method that incorporates a modified Polyak step size and Nesterov's momentum. Notably, our method does not require knowledge of smoothness levels, H\"{o}lder continuity parameter of the gradient, or additional line search, yet achieves the optimal oracle complexity bound of $\mathcal{O}(\varepsilon^{-2/(1+3\rho)})$ under H\"{o}lder smoothness conditions. When $f^*$ is unknown, we reformulate the problem as finding the root of the optimal value function and develop inexact fixed-point iteration and secant method to compute $f^*$. These root-finding subproblems are solved inexactly using first-order methods to a specified relative accuracy. We employ the accelerated prox-level (APL) method, which is proven to be uniformly optimal for convex optimization with simple constraints. Our analysis demonstrates that APL-based level-set methods also achieve the optimal oracle complexity of $\mathcal{O}(\varepsilon^{-2/(1+3\rho)})$ for convex function-constrained optimization, without requiring knowledge of any problem-specific structures. Through experiments on various tasks, we demonstrate the advantages of our methods over existing approaches in function-constrained optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends APL to function-constrained convex problems with a parameter-free claim via modified Polyak steps and inexact root-finding, but the diagonal QP oracles are the load-bearing piece whose cost and parameter needs are left unspecified.

read the letter

The core new piece is a pair of methods that hit the optimal Hölder rate O(ε^{-2/(1+3ρ)}) for convex problems with convex functional constraints without knowing smoothness, the Hölder exponent, or running line search. When f* is known they use a modified Polyak step plus Nesterov momentum; when it is unknown they recast the problem as root-finding on the optimal-value function and solve it inexactly with first-order methods inside an APL level-set framework. Both routes are presented as uniformly optimal and parameter-free from standard first-order oracles plus the extra diagonal-QP oracles mentioned in the abstract. Experiments are reported to beat prior approaches on several tasks. That combination for the function-constrained setting is the actual increment over the cited APL work. The analysis appears to treat the QP oracles as black-box computational primitives whose calls do not count against the function/gradient budget and do not require knowledge of ρ. The soft spot is exactly there. The abstract gives no indication how those diagonal QPs are solved, whether their own subroutines are parameter-free, or how many extra evaluations they consume. If the oracles themselves need a known Hölder parameter or internal line searches, the parameter-free optimality claim does not follow from first-order oracles alone. The inexact fixed-point analysis for the root-finding case also needs to be checked for hidden dependencies on accuracy parameters. The paper is aimed at researchers who care about optimal first-order rates for constrained convex problems. It is technically engaged with the right literature and the claims are stated sharply enough that a serious referee should see it, even if the oracle accounting requires clarification or additional justification. I would send it to review.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces uniformly optimal and parameter-free first-order methods for convex optimization with convex functional constraints. When f* is known, it proposes an accelerated method with modified Polyak step size and Nesterov's momentum that achieves the optimal oracle complexity O(ε^{-2/(1+3ρ)}) under Hölder smoothness without knowledge of smoothness levels or ρ. When f* is unknown, the problem is recast as root-finding of the optimal value function; inexact fixed-point iteration and secant methods are combined with the APL method (previously shown optimal for simple constraints) to obtain the same complexity bound. Complexity is measured solely in function/gradient evaluations, while subproblems invoke external computational oracles for diagonal quadratic programs solved to relative accuracy.

Significance. If the central claims hold after clarification of the oracle costs, the work would advance the literature on parameter-free first-order methods by extending the APL framework to functional constraints while preserving optimality and avoiding line searches or parameter tuning. The explicit handling of both known and unknown f* cases, together with the experimental comparisons, would strengthen the practical utility for overparameterized models and feasibility problems.

major comments (2)

[Abstract] Abstract (and the paragraphs describing subproblem oracles): the claim that the methods remain parameter-free and achieve O(ε^{-2/(1+3ρ)}) while counting only function/gradient evaluations is load-bearing on the assumption that the diagonal quadratic program oracles can themselves be realized by parameter-free first-order procedures whose cost does not require knowledge of ρ or additional evaluations; the manuscript provides no explicit construction or complexity accounting for these oracles.
[Abstract] Abstract (description of inexact fixed-point iteration and secant method): the analysis asserts that root-finding subproblems can be solved inexactly to a specified relative accuracy by first-order methods without degrading the overall optimal complexity or the parameter-free property, yet no error-propagation bounds or inner-iteration counts are supplied to confirm that the inexactness preserves the Hölder-smoothness exploitation.

minor comments (1)

Notation for the Hölder parameter ρ and the relative accuracy tolerances in the root-finding procedures could be introduced with a single consolidated table or definition block to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on the abstract and subproblem analysis. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and explicit constructions.

read point-by-point responses

Referee: [Abstract] Abstract (and the paragraphs describing subproblem oracles): the claim that the methods remain parameter-free and achieve O(ε^{-2/(1+3ρ)}) while counting only function/gradient evaluations is load-bearing on the assumption that the diagonal quadratic program oracles can themselves be realized by parameter-free first-order procedures whose cost does not require knowledge of ρ or additional evaluations; the manuscript provides no explicit construction or complexity accounting for these oracles.

Authors: We agree that the current manuscript lacks an explicit construction and complexity accounting for the diagonal quadratic program oracles. These oracles arise in the level-set subproblems and, owing to their diagonal structure, admit closed-form solutions via coordinate-wise operations that require no knowledge of ρ and incur no additional function/gradient evaluations beyond those already counted. In the revision we will add a dedicated paragraph (and, if needed, a short appendix) supplying this construction together with a complexity argument showing that any first-order realization of the oracles contributes only a constant factor absorbed into the O(ε^{-2/(1+3ρ)}) bound. This will make the parameter-free claim fully rigorous while preserving the stated oracle complexity measured solely in function/gradient evaluations. revision: yes
Referee: [Abstract] Abstract (description of inexact fixed-point iteration and secant method): the analysis asserts that root-finding subproblems can be solved inexactly to a specified relative accuracy by first-order methods without degrading the overall optimal complexity or the parameter-free property, yet no error-propagation bounds or inner-iteration counts are supplied to confirm that the inexactness preserves the Hölder-smoothness exploitation.

Authors: We acknowledge that the current version does not supply explicit error-propagation bounds or inner-iteration counts for the inexact fixed-point iteration and secant method. The analysis already chooses the relative accuracy of each root-finding subproblem so that the overall rate is preserved, but the propagation details under Hölder smoothness are only sketched. In the revised manuscript we will insert a new lemma (with a short proof) that quantifies the propagation of inexactness, together with explicit inner-iteration bounds showing that the number of additional first-order steps remains compatible with the outer O(ε^{-2/(1+3ρ)}) complexity and does not require knowledge of ρ. This will confirm that both the optimal rate and the parameter-free property are retained. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior APL optimality; derivation remains independent

full rationale

The paper's central claims rest on extensions of the accelerated prox-level (APL) method, explicitly described as 'proven to be uniformly optimal for convex optimization with simple constraints' and employed for level-set methods to achieve O(ε^{-2/(1+3ρ)}) bounds. This constitutes a self-citation to prior work by overlapping authors (Lan et al.), but the citation is not load-bearing in a circular sense: the current paper's contributions (modified Polyak steps, inexact root-finding via first-order methods, and diagonal QP oracles) add independent structure without reducing any prediction or optimality result to a fit or definition internal to this manuscript. No self-definitional equations, fitted inputs renamed as predictions, or ansatz smuggling appear in the provided text. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard domain assumptions in convex optimization and the technical capability to solve certain quadratic subproblems; no free parameters are introduced as the methods are parameter-free.

axioms (3)

domain assumption Functions are convex.
Fundamental to the optimization setting.
domain assumption Gradient satisfies Hölder smoothness with parameter ρ.
Used for the complexity bound.
domain assumption Subproblems admit oracles for diagonal quadratic programs.
Enables the first-order methods as stated in the abstract.

pith-pipeline@v0.9.0 · 5817 in / 1413 out tokens · 28740 ms · 2026-05-23T07:39:38.463501+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Universal and Parameter-free Gradient Sliding for Composite Optimization
math.OC 2026-03 unverdicted novelty 7.0

PFUGS is the first parameter-free gradient sliding method for composite convex problems with unknown Hölder and Lipschitz constants, using O((M_ν/ε)^{2/(1+3ν)}) subgradient evaluations of f and O((L/ε)^{1/2}) gradient...

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

M. ApS. Mosek optimization toolbox for matlab.User’s Guide and Reference Manual, Version, 4(1), 2019

work page 2019
[2]

A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for convex optimization.Mathematical Programming, 174:359–390, 2019

work page 2019
[3]

Ben-Tal and A

A. Ben-Tal and A. Nemirovski. Non-euclidean restricted memory level method for large-scale convex optimization. Mathematical Programming, 102:407–456, 2005

work page 2005
[4]

Bolte, T

J. Bolte, T. P. Nguyen, J. Peypouquet, and B. W. Suter. From error bounds to the complexity of first-order descent methods for convex functions.Mathematical Programming, 165:471–507, 2017

work page 2017
[5]

D. Boob, Q. Deng, and G. Lan. Stochastic first-order methods for convex and nonconvex functional constrained optimization. Mathematical Programming, pages 1–65, 2022

work page 2022
[6]

D. Boob, Q. Deng, and G. Lan. Level constrained first order methods for function constrained optimiza- tion. Mathematical Programming, pages 1–61, 2024

work page 2024
[7]

J. P. Boyd, Stephen. Subgradient methods. Notes for EE364b, Stanford University, Spring 2013–14, 2014

work page 2013
[8]

J. V. Burke and M. C. Ferris. Weak sharp minima in mathematical programming.SIAM Journal on Control and Optimization, 31(5):1340–1359, 1993

work page 1993
[9]

Chambolle and T

A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011

work page 2011
[10]

Projection-Free Functional Constrained Optimization for Risk Aversion and Sparsity Control

Y. Cheng, G. Lan, and H. E. Romeijn. Functional constrained optimization for risk aversion and sparsity control. arXiv preprint arXiv:2210.05108, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[11]

Cotter, H

A. Cotter, H. Jiang, M. Gupta, S. Wang, T. Narayan, S. You, and K. Sridharan. Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals.Journal of Machine Learning Research, 20(172):1–59, 2019

work page 2019
[12]

Davis, D

D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette. Subgradient methods for sharp weakly convex functions.Journal of Optimization Theory and Applications, 179:962–982, 2018

work page 2018
[13]

Devanathan and S

N. Devanathan and S. Boyd. Polyak minorant method for convex optimization.Journal of Optimization Theory and Applications, pages 1–20, 2024. 28

work page 2024
[14]

Guyon, Isabelle, Gunn, Steve, Ben-Hur, Asa, and D. Gideon. Arcene. UCI Machine Learning Repository,

work page
[15]

DOI: https://doi.org/10.24432/C58P55

work page doi:10.24432/c58p55
[16]

E. Y. Hamedani and N. S. Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems.SIAM Journal on Optimization, 31(2):1299–1329, 2021

work page 2021
[17]

Hazan and S

E. Hazan and S. Kakade. Revisiting the polyak step size.arXiv preprint arXiv:1905.00313, 2019

work page arXiv 1905
[18]

Jiang and X

R. Jiang and X. Li. Hölderian error bounds and kurdyka-łojasiewicz inequality for the trust region subproblem. Mathematics of Operations Research, 47(4):3025–3050, 2022

work page 2022
[19]

Koklu and I

M. Koklu and I. A. Özkan. Multiclass classification of dry beans using computer vision and machine learning techniques.Comput. Electron. Agric., 174:105507, 2020. URLhttps://api.semanticscholar. org/CorpusID:219762890

work page 2020
[20]

G. Lan. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Mathematical Programming, 149(1-2):1–45, 2015

work page 2015
[21]

G. Lan. First-order and Stochastic Optimization Methods for Machine Learning. Springer, 2020

work page 2020
[22]

Lan and R

G. Lan and R. D. C. Monteiro. Iteration-complexity of first-order penalty methods for convex program- ming. Mathematical Programming, 138:115–139, 2013

work page 2013
[23]

Lemaréchal, A

C. Lemaréchal, A. S. Nemirovski, and Y. E. Nesterov. New variants of bundle methods.Mathematical Programming, 69:111–148, 1995

work page 1995
[24]

G. Li. Global error bounds for piecewise convex polynomials.Mathematical programming, 137(1):37–64, 2013

work page 2013
[25]

Li and G

T. Li and G. Lan. A simple uniformly optimal method without line search for convex optimization. arXiv preprint arXiv:2310.10082, 2023

work page arXiv 2023
[26]

Q. Lin, S. Nadarajah, and N. Soheili. A level-set method for convex optimization with a feasible solution path. SIAM Journal on Optimization, 28(4):3290–3311, 2018

work page 2018
[27]

Lin and Q

Z. Lin and Q. Deng. Efficient first-order methods for convex optimization with strongly convex function constraints. arXiv preprint arXiv:2212.11143, 2022

work page arXiv 2022
[28]

Loizou, S

N. Loizou, S. Vaswani, I. H. Laradji, and S. Lacoste-Julien. Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence. InInternational Conference on Artificial Intelligence and Statistics, pages 1306–1314. PMLR, 2021

work page 2021
[29]

B. S. Mordukhovich and N. M. Nam.Convex analysis and beyond. Springer, 2022

work page 2022
[30]

Necoara, Y

I. Necoara, Y. Nesterov, and F. Glineur. Linear convergence of first order methods for non-strongly convex optimization. Mathematical Programming, 175(1-2):69–107, 2019. ISSN 0025-5610. doi: 10. 1007/s10107-018-1232-1

work page 2019
[31]

Nesterov

Y. Nesterov. A method for solving the convex programming problem with convergence rate o (1/k2). In Dokl akad nauk Sssr, volume 269, page 543, 1983

work page 1983
[32]

Nesterov

Y. Nesterov. Gradient methods for minimizing composite functions.Mathematical Programming, 140 (1):125–161, 2013. doi: 10.1007/s10107-012-0629-5

work page doi:10.1007/s10107-012-0629-5 2013
[33]

Nesterov

Y. Nesterov. Universal gradient methods for convex optimization problems.Mathematical Programming, 152(1):381–404, 2015

work page 2015
[34]

Nesterov et al.Lectures on convex optimization, volume 137

Y. Nesterov et al.Lectures on convex optimization, volume 137. Springer, 2018

work page 2018
[35]

K. F. Ng and X. Y. Zheng. Global error bounds with fractional exponents.Mathematical programming, 88:357–370, 2000. 29

work page 2000
[36]

B. T. Polyak. Introduction to optimization. 1987

work page 1987
[37]

Rigollet and X

P. Rigollet and X. Tong. Neyman-pearson classification, convexity and stochastic constraints.Journal of Machine Learning Research, 2011

work page 2011
[38]

R. T. Rockafellar and R. J.-B. Wets.Variational analysis, volume 317. Springer Science & Business Media, 2009

work page 2009
[39]

X. Tong, Y. Feng, and A. Zhao. A survey on neyman-pearson classification and suggestions for future research. Wiley Interdisciplinary Reviews: Computational Statistics, 8(2):64–81, 2016

work page 2016
[40]

P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 2(3), 2008

work page 2008
[41]

H. H. Vui. Global holderian error bound for nondegenerate polynomials.SIAM Journal on Optimization, 23(2):917–933, 2013

work page 2013
[42]

Y. Xu. Iteration complexity of inexact augmented lagrangian methods for constrained convex program- ming. Mathematical Programming, 185:199–244, 2021

work page 2021
[43]

Zhang and G

Z. Zhang and G. Lan. Solving convex smooth function constrained optimization is almost as easy as unconstrained optimization. arXiv preprint arXiv:2210.05807, 2022. A Missing Proofs A.1 Proof of Theorem 4.2 Let δs = ¯us − ¯ls, we haveδs+1 ≤ νδs. We shall consider the following two cases. First, suppose V (η) ≥ 1 2 ε. Then, it takes at mostS = max n 0, l l...

work page arXiv 2022

[1] [1]

M. ApS. Mosek optimization toolbox for matlab.User’s Guide and Reference Manual, Version, 4(1), 2019

work page 2019

[2] [2]

A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for convex optimization.Mathematical Programming, 174:359–390, 2019

work page 2019

[3] [3]

Ben-Tal and A

A. Ben-Tal and A. Nemirovski. Non-euclidean restricted memory level method for large-scale convex optimization. Mathematical Programming, 102:407–456, 2005

work page 2005

[4] [4]

Bolte, T

J. Bolte, T. P. Nguyen, J. Peypouquet, and B. W. Suter. From error bounds to the complexity of first-order descent methods for convex functions.Mathematical Programming, 165:471–507, 2017

work page 2017

[5] [5]

D. Boob, Q. Deng, and G. Lan. Stochastic first-order methods for convex and nonconvex functional constrained optimization. Mathematical Programming, pages 1–65, 2022

work page 2022

[6] [6]

D. Boob, Q. Deng, and G. Lan. Level constrained first order methods for function constrained optimiza- tion. Mathematical Programming, pages 1–61, 2024

work page 2024

[7] [7]

J. P. Boyd, Stephen. Subgradient methods. Notes for EE364b, Stanford University, Spring 2013–14, 2014

work page 2013

[8] [8]

J. V. Burke and M. C. Ferris. Weak sharp minima in mathematical programming.SIAM Journal on Control and Optimization, 31(5):1340–1359, 1993

work page 1993

[9] [9]

Chambolle and T

A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011

work page 2011

[10] [10]

Projection-Free Functional Constrained Optimization for Risk Aversion and Sparsity Control

Y. Cheng, G. Lan, and H. E. Romeijn. Functional constrained optimization for risk aversion and sparsity control. arXiv preprint arXiv:2210.05108, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[11] [11]

Cotter, H

A. Cotter, H. Jiang, M. Gupta, S. Wang, T. Narayan, S. You, and K. Sridharan. Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals.Journal of Machine Learning Research, 20(172):1–59, 2019

work page 2019

[12] [12]

Davis, D

D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette. Subgradient methods for sharp weakly convex functions.Journal of Optimization Theory and Applications, 179:962–982, 2018

work page 2018

[13] [13]

Devanathan and S

N. Devanathan and S. Boyd. Polyak minorant method for convex optimization.Journal of Optimization Theory and Applications, pages 1–20, 2024. 28

work page 2024

[14] [14]

Guyon, Isabelle, Gunn, Steve, Ben-Hur, Asa, and D. Gideon. Arcene. UCI Machine Learning Repository,

work page

[15] [15]

DOI: https://doi.org/10.24432/C58P55

work page doi:10.24432/c58p55

[16] [16]

E. Y. Hamedani and N. S. Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems.SIAM Journal on Optimization, 31(2):1299–1329, 2021

work page 2021

[17] [17]

Hazan and S

E. Hazan and S. Kakade. Revisiting the polyak step size.arXiv preprint arXiv:1905.00313, 2019

work page arXiv 1905

[18] [18]

Jiang and X

R. Jiang and X. Li. Hölderian error bounds and kurdyka-łojasiewicz inequality for the trust region subproblem. Mathematics of Operations Research, 47(4):3025–3050, 2022

work page 2022

[19] [19]

Koklu and I

M. Koklu and I. A. Özkan. Multiclass classification of dry beans using computer vision and machine learning techniques.Comput. Electron. Agric., 174:105507, 2020. URLhttps://api.semanticscholar. org/CorpusID:219762890

work page 2020

[20] [20]

G. Lan. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Mathematical Programming, 149(1-2):1–45, 2015

work page 2015

[21] [21]

G. Lan. First-order and Stochastic Optimization Methods for Machine Learning. Springer, 2020

work page 2020

[22] [22]

Lan and R

G. Lan and R. D. C. Monteiro. Iteration-complexity of first-order penalty methods for convex program- ming. Mathematical Programming, 138:115–139, 2013

work page 2013

[23] [23]

Lemaréchal, A

C. Lemaréchal, A. S. Nemirovski, and Y. E. Nesterov. New variants of bundle methods.Mathematical Programming, 69:111–148, 1995

work page 1995

[24] [24]

G. Li. Global error bounds for piecewise convex polynomials.Mathematical programming, 137(1):37–64, 2013

work page 2013

[25] [25]

Li and G

T. Li and G. Lan. A simple uniformly optimal method without line search for convex optimization. arXiv preprint arXiv:2310.10082, 2023

work page arXiv 2023

[26] [26]

Q. Lin, S. Nadarajah, and N. Soheili. A level-set method for convex optimization with a feasible solution path. SIAM Journal on Optimization, 28(4):3290–3311, 2018

work page 2018

[27] [27]

Lin and Q

Z. Lin and Q. Deng. Efficient first-order methods for convex optimization with strongly convex function constraints. arXiv preprint arXiv:2212.11143, 2022

work page arXiv 2022

[28] [28]

Loizou, S

N. Loizou, S. Vaswani, I. H. Laradji, and S. Lacoste-Julien. Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence. InInternational Conference on Artificial Intelligence and Statistics, pages 1306–1314. PMLR, 2021

work page 2021

[29] [29]

B. S. Mordukhovich and N. M. Nam.Convex analysis and beyond. Springer, 2022

work page 2022

[30] [30]

Necoara, Y

I. Necoara, Y. Nesterov, and F. Glineur. Linear convergence of first order methods for non-strongly convex optimization. Mathematical Programming, 175(1-2):69–107, 2019. ISSN 0025-5610. doi: 10. 1007/s10107-018-1232-1

work page 2019

[31] [31]

Nesterov

Y. Nesterov. A method for solving the convex programming problem with convergence rate o (1/k2). In Dokl akad nauk Sssr, volume 269, page 543, 1983

work page 1983

[32] [32]

Nesterov

Y. Nesterov. Gradient methods for minimizing composite functions.Mathematical Programming, 140 (1):125–161, 2013. doi: 10.1007/s10107-012-0629-5

work page doi:10.1007/s10107-012-0629-5 2013

[33] [33]

Nesterov

Y. Nesterov. Universal gradient methods for convex optimization problems.Mathematical Programming, 152(1):381–404, 2015

work page 2015

[34] [34]

Nesterov et al.Lectures on convex optimization, volume 137

Y. Nesterov et al.Lectures on convex optimization, volume 137. Springer, 2018

work page 2018

[35] [35]

K. F. Ng and X. Y. Zheng. Global error bounds with fractional exponents.Mathematical programming, 88:357–370, 2000. 29

work page 2000

[36] [36]

B. T. Polyak. Introduction to optimization. 1987

work page 1987

[37] [37]

Rigollet and X

P. Rigollet and X. Tong. Neyman-pearson classification, convexity and stochastic constraints.Journal of Machine Learning Research, 2011

work page 2011

[38] [38]

R. T. Rockafellar and R. J.-B. Wets.Variational analysis, volume 317. Springer Science & Business Media, 2009

work page 2009

[39] [39]

X. Tong, Y. Feng, and A. Zhao. A survey on neyman-pearson classification and suggestions for future research. Wiley Interdisciplinary Reviews: Computational Statistics, 8(2):64–81, 2016

work page 2016

[40] [40]

P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 2(3), 2008

work page 2008

[41] [41]

H. H. Vui. Global holderian error bound for nondegenerate polynomials.SIAM Journal on Optimization, 23(2):917–933, 2013

work page 2013

[42] [42]

Y. Xu. Iteration complexity of inexact augmented lagrangian methods for constrained convex program- ming. Mathematical Programming, 185:199–244, 2021

work page 2021

[43] [43]

Zhang and G

Z. Zhang and G. Lan. Solving convex smooth function constrained optimization is almost as easy as unconstrained optimization. arXiv preprint arXiv:2210.05807, 2022. A Missing Proofs A.1 Proof of Theorem 4.2 Let δs = ¯us − ¯ls, we haveδs+1 ≤ νδs. We shall consider the following two cases. First, suppose V (η) ≥ 1 2 ε. Then, it takes at mostS = max n 0, l l...

work page arXiv 2022