Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization
Pith reviewed 2026-05-23 07:39 UTC · model grok-4.3
The pith
A parameter-free first-order method achieves the optimal oracle complexity for convex optimization with functional constraints under Hölder smoothness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that first-order methods can be made uniformly optimal and parameter-free for convex optimization with convex functional constraints by enabling oracles for diagonal quadratic programs in subproblems; when the optimal value f* is known the method uses a modified Polyak stepsize with Nesterov momentum, and when f* is unknown the problem is recast as root-finding solved via inexact fixed-point iteration and the accelerated prox-level method, both attaining the bound O(ε^{-2/(1+3ρ)}) without knowledge of smoothness or Hölder parameters.
What carries the argument
The accelerated prox-level (APL) method used inside level-set reformulations when f* is unknown, together with the modified Polyak stepsize and Nesterov momentum when f* is known.
If this is right
- The methods apply directly to feasibility problems and overparameterized models where f* is known.
- Root-finding subproblems are solved to relative accuracy by first-order methods without losing the overall optimal rate.
- The same complexity bound holds for convex function-constrained optimization as for problems with simple constraints.
- No line search or prior structural knowledge is needed to reach the Hölder-optimal oracle complexity.
Where Pith is reading between the lines
- Implementation in machine learning pipelines could become simpler because no separate estimation of smoothness constants is required.
- The root-finding reduction suggests that similar inexact fixed-point schemes might produce parameter-free variants for other classes of constrained problems.
- The approach separates the outer root search from the inner first-order solver, which could allow modular replacement of the inner solver while preserving optimality.
Load-bearing premise
The subproblems can be solved using computational oracles for diagonal quadratic programs and inexact first-order methods to a specified relative accuracy.
What would settle it
A concrete counter-example or numerical test on a Hölder-smooth function-constrained problem in which the observed number of gradient evaluations exceeds the stated O(ε^{-2/(1+3ρ)}) bound once all smoothness and Hölder parameters are hidden from the algorithm.
Figures
read the original abstract
This paper presents new first-order methods for achieving optimal oracle complexities in convex optimization with convex functional constraints. Oracle complexities are measured by the number of function and gradient evaluations. To achieve this, we enable first-order methods to utilize computational oracles for solving diagonal quadratic programs in subproblems. For problems where the optimal value $f^*$ is known, such as those in overparameterized models and feasibility problems, we propose an accelerated first-order method that incorporates a modified Polyak step size and Nesterov's momentum. Notably, our method does not require knowledge of smoothness levels, H\"{o}lder continuity parameter of the gradient, or additional line search, yet achieves the optimal oracle complexity bound of $\mathcal{O}(\varepsilon^{-2/(1+3\rho)})$ under H\"{o}lder smoothness conditions. When $f^*$ is unknown, we reformulate the problem as finding the root of the optimal value function and develop inexact fixed-point iteration and secant method to compute $f^*$. These root-finding subproblems are solved inexactly using first-order methods to a specified relative accuracy. We employ the accelerated prox-level (APL) method, which is proven to be uniformly optimal for convex optimization with simple constraints. Our analysis demonstrates that APL-based level-set methods also achieve the optimal oracle complexity of $\mathcal{O}(\varepsilon^{-2/(1+3\rho)})$ for convex function-constrained optimization, without requiring knowledge of any problem-specific structures. Through experiments on various tasks, we demonstrate the advantages of our methods over existing approaches in function-constrained optimization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces uniformly optimal and parameter-free first-order methods for convex optimization with convex functional constraints. When f* is known, it proposes an accelerated method with modified Polyak step size and Nesterov's momentum that achieves the optimal oracle complexity O(ε^{-2/(1+3ρ)}) under Hölder smoothness without knowledge of smoothness levels or ρ. When f* is unknown, the problem is recast as root-finding of the optimal value function; inexact fixed-point iteration and secant methods are combined with the APL method (previously shown optimal for simple constraints) to obtain the same complexity bound. Complexity is measured solely in function/gradient evaluations, while subproblems invoke external computational oracles for diagonal quadratic programs solved to relative accuracy.
Significance. If the central claims hold after clarification of the oracle costs, the work would advance the literature on parameter-free first-order methods by extending the APL framework to functional constraints while preserving optimality and avoiding line searches or parameter tuning. The explicit handling of both known and unknown f* cases, together with the experimental comparisons, would strengthen the practical utility for overparameterized models and feasibility problems.
major comments (2)
- [Abstract] Abstract (and the paragraphs describing subproblem oracles): the claim that the methods remain parameter-free and achieve O(ε^{-2/(1+3ρ)}) while counting only function/gradient evaluations is load-bearing on the assumption that the diagonal quadratic program oracles can themselves be realized by parameter-free first-order procedures whose cost does not require knowledge of ρ or additional evaluations; the manuscript provides no explicit construction or complexity accounting for these oracles.
- [Abstract] Abstract (description of inexact fixed-point iteration and secant method): the analysis asserts that root-finding subproblems can be solved inexactly to a specified relative accuracy by first-order methods without degrading the overall optimal complexity or the parameter-free property, yet no error-propagation bounds or inner-iteration counts are supplied to confirm that the inexactness preserves the Hölder-smoothness exploitation.
minor comments (1)
- Notation for the Hölder parameter ρ and the relative accuracy tolerances in the root-finding procedures could be introduced with a single consolidated table or definition block to improve readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on the abstract and subproblem analysis. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and explicit constructions.
read point-by-point responses
-
Referee: [Abstract] Abstract (and the paragraphs describing subproblem oracles): the claim that the methods remain parameter-free and achieve O(ε^{-2/(1+3ρ)}) while counting only function/gradient evaluations is load-bearing on the assumption that the diagonal quadratic program oracles can themselves be realized by parameter-free first-order procedures whose cost does not require knowledge of ρ or additional evaluations; the manuscript provides no explicit construction or complexity accounting for these oracles.
Authors: We agree that the current manuscript lacks an explicit construction and complexity accounting for the diagonal quadratic program oracles. These oracles arise in the level-set subproblems and, owing to their diagonal structure, admit closed-form solutions via coordinate-wise operations that require no knowledge of ρ and incur no additional function/gradient evaluations beyond those already counted. In the revision we will add a dedicated paragraph (and, if needed, a short appendix) supplying this construction together with a complexity argument showing that any first-order realization of the oracles contributes only a constant factor absorbed into the O(ε^{-2/(1+3ρ)}) bound. This will make the parameter-free claim fully rigorous while preserving the stated oracle complexity measured solely in function/gradient evaluations. revision: yes
-
Referee: [Abstract] Abstract (description of inexact fixed-point iteration and secant method): the analysis asserts that root-finding subproblems can be solved inexactly to a specified relative accuracy by first-order methods without degrading the overall optimal complexity or the parameter-free property, yet no error-propagation bounds or inner-iteration counts are supplied to confirm that the inexactness preserves the Hölder-smoothness exploitation.
Authors: We acknowledge that the current version does not supply explicit error-propagation bounds or inner-iteration counts for the inexact fixed-point iteration and secant method. The analysis already chooses the relative accuracy of each root-finding subproblem so that the overall rate is preserved, but the propagation details under Hölder smoothness are only sketched. In the revised manuscript we will insert a new lemma (with a short proof) that quantifies the propagation of inexactness, together with explicit inner-iteration bounds showing that the number of additional first-order steps remains compatible with the outer O(ε^{-2/(1+3ρ)}) complexity and does not require knowledge of ρ. This will confirm that both the optimal rate and the parameter-free property are retained. revision: yes
Circularity Check
Minor self-citation to prior APL optimality; derivation remains independent
full rationale
The paper's central claims rest on extensions of the accelerated prox-level (APL) method, explicitly described as 'proven to be uniformly optimal for convex optimization with simple constraints' and employed for level-set methods to achieve O(ε^{-2/(1+3ρ)}) bounds. This constitutes a self-citation to prior work by overlapping authors (Lan et al.), but the citation is not load-bearing in a circular sense: the current paper's contributions (modified Polyak steps, inexact root-finding via first-order methods, and diagonal QP oracles) add independent structure without reducing any prediction or optimality result to a fit or definition internal to this manuscript. No self-definitional equations, fitted inputs renamed as predictions, or ansatz smuggling appear in the provided text. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption Functions are convex.
- domain assumption Gradient satisfies Hölder smoothness with parameter ρ.
- domain assumption Subproblems admit oracles for diagonal quadratic programs.
Forward citations
Cited by 1 Pith paper
-
Universal and Parameter-free Gradient Sliding for Composite Optimization
PFUGS is the first parameter-free gradient sliding method for composite convex problems with unknown Hölder and Lipschitz constants, using O((M_ν/ε)^{2/(1+3ν)}) subgradient evaluations of f and O((L/ε)^{1/2}) gradient...
Reference graph
Works this paper leans on
-
[1]
M. ApS. Mosek optimization toolbox for matlab.User’s Guide and Reference Manual, Version, 4(1), 2019
work page 2019
-
[2]
A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for convex optimization.Mathematical Programming, 174:359–390, 2019
work page 2019
-
[3]
A. Ben-Tal and A. Nemirovski. Non-euclidean restricted memory level method for large-scale convex optimization. Mathematical Programming, 102:407–456, 2005
work page 2005
- [4]
-
[5]
D. Boob, Q. Deng, and G. Lan. Stochastic first-order methods for convex and nonconvex functional constrained optimization. Mathematical Programming, pages 1–65, 2022
work page 2022
-
[6]
D. Boob, Q. Deng, and G. Lan. Level constrained first order methods for function constrained optimiza- tion. Mathematical Programming, pages 1–61, 2024
work page 2024
-
[7]
J. P. Boyd, Stephen. Subgradient methods. Notes for EE364b, Stanford University, Spring 2013–14, 2014
work page 2013
-
[8]
J. V. Burke and M. C. Ferris. Weak sharp minima in mathematical programming.SIAM Journal on Control and Optimization, 31(5):1340–1359, 1993
work page 1993
-
[9]
A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011
work page 2011
-
[10]
Projection-Free Functional Constrained Optimization for Risk Aversion and Sparsity Control
Y. Cheng, G. Lan, and H. E. Romeijn. Functional constrained optimization for risk aversion and sparsity control. arXiv preprint arXiv:2210.05108, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [11]
- [12]
-
[13]
N. Devanathan and S. Boyd. Polyak minorant method for convex optimization.Journal of Optimization Theory and Applications, pages 1–20, 2024. 28
work page 2024
-
[14]
Guyon, Isabelle, Gunn, Steve, Ben-Hur, Asa, and D. Gideon. Arcene. UCI Machine Learning Repository,
-
[15]
DOI: https://doi.org/10.24432/C58P55
-
[16]
E. Y. Hamedani and N. S. Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems.SIAM Journal on Optimization, 31(2):1299–1329, 2021
work page 2021
-
[17]
E. Hazan and S. Kakade. Revisiting the polyak step size.arXiv preprint arXiv:1905.00313, 2019
-
[18]
R. Jiang and X. Li. Hölderian error bounds and kurdyka-łojasiewicz inequality for the trust region subproblem. Mathematics of Operations Research, 47(4):3025–3050, 2022
work page 2022
-
[19]
M. Koklu and I. A. Özkan. Multiclass classification of dry beans using computer vision and machine learning techniques.Comput. Electron. Agric., 174:105507, 2020. URLhttps://api.semanticscholar. org/CorpusID:219762890
work page 2020
-
[20]
G. Lan. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Mathematical Programming, 149(1-2):1–45, 2015
work page 2015
-
[21]
G. Lan. First-order and Stochastic Optimization Methods for Machine Learning. Springer, 2020
work page 2020
- [22]
-
[23]
C. Lemaréchal, A. S. Nemirovski, and Y. E. Nesterov. New variants of bundle methods.Mathematical Programming, 69:111–148, 1995
work page 1995
-
[24]
G. Li. Global error bounds for piecewise convex polynomials.Mathematical programming, 137(1):37–64, 2013
work page 2013
- [25]
-
[26]
Q. Lin, S. Nadarajah, and N. Soheili. A level-set method for convex optimization with a feasible solution path. SIAM Journal on Optimization, 28(4):3290–3311, 2018
work page 2018
- [27]
- [28]
-
[29]
B. S. Mordukhovich and N. M. Nam.Convex analysis and beyond. Springer, 2022
work page 2022
-
[30]
I. Necoara, Y. Nesterov, and F. Glineur. Linear convergence of first order methods for non-strongly convex optimization. Mathematical Programming, 175(1-2):69–107, 2019. ISSN 0025-5610. doi: 10. 1007/s10107-018-1232-1
work page 2019
- [31]
-
[32]
Y. Nesterov. Gradient methods for minimizing composite functions.Mathematical Programming, 140 (1):125–161, 2013. doi: 10.1007/s10107-012-0629-5
- [33]
-
[34]
Nesterov et al.Lectures on convex optimization, volume 137
Y. Nesterov et al.Lectures on convex optimization, volume 137. Springer, 2018
work page 2018
-
[35]
K. F. Ng and X. Y. Zheng. Global error bounds with fractional exponents.Mathematical programming, 88:357–370, 2000. 29
work page 2000
-
[36]
B. T. Polyak. Introduction to optimization. 1987
work page 1987
-
[37]
P. Rigollet and X. Tong. Neyman-pearson classification, convexity and stochastic constraints.Journal of Machine Learning Research, 2011
work page 2011
-
[38]
R. T. Rockafellar and R. J.-B. Wets.Variational analysis, volume 317. Springer Science & Business Media, 2009
work page 2009
-
[39]
X. Tong, Y. Feng, and A. Zhao. A survey on neyman-pearson classification and suggestions for future research. Wiley Interdisciplinary Reviews: Computational Statistics, 8(2):64–81, 2016
work page 2016
-
[40]
P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 2(3), 2008
work page 2008
-
[41]
H. H. Vui. Global holderian error bound for nondegenerate polynomials.SIAM Journal on Optimization, 23(2):917–933, 2013
work page 2013
-
[42]
Y. Xu. Iteration complexity of inexact augmented lagrangian methods for constrained convex program- ming. Mathematical Programming, 185:199–244, 2021
work page 2021
-
[43]
Z. Zhang and G. Lan. Solving convex smooth function constrained optimization is almost as easy as unconstrained optimization. arXiv preprint arXiv:2210.05807, 2022. A Missing Proofs A.1 Proof of Theorem 4.2 Let δs = ¯us − ¯ls, we haveδs+1 ≤ νδs. We shall consider the following two cases. First, suppose V (η) ≥ 1 2 ε. Then, it takes at mostS = max n 0, l l...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.