pith. sign in

arxiv: 2412.06319 · v3 · submitted 2024-12-09 · 🧮 math.OC

Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization

Pith reviewed 2026-05-23 07:39 UTC · model grok-4.3

classification 🧮 math.OC
keywords first-order methodsconvex optimizationfunctional constraintsparameter-freeoptimal oracle complexityHölder smoothnessaccelerated methodslevel-set methods
0
0 comments X

The pith

A parameter-free first-order method achieves the optimal oracle complexity for convex optimization with functional constraints under Hölder smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops first-order methods for convex optimization problems that include convex functional constraints. These methods are designed to be parameter-free, meaning they do not require knowledge of smoothness levels or Hölder continuity parameters, and they avoid line searches. For problems where the optimal objective value is known, an accelerated method with a modified Polyak step size and Nesterov's momentum is proposed. When the optimal value is unknown, the problem is reformulated as a root-finding task solved using the accelerated prox-level method. In both cases, the methods attain the optimal rate of O(ε^{-2/(1+3ρ)}) under Hölder smoothness assumptions using only function and gradient evaluations.

Core claim

The central claim is that first-order methods can be made uniformly optimal and parameter-free for convex optimization with convex functional constraints by enabling oracles for diagonal quadratic programs in subproblems; when the optimal value f* is known the method uses a modified Polyak stepsize with Nesterov momentum, and when f* is unknown the problem is recast as root-finding solved via inexact fixed-point iteration and the accelerated prox-level method, both attaining the bound O(ε^{-2/(1+3ρ)}) without knowledge of smoothness or Hölder parameters.

What carries the argument

The accelerated prox-level (APL) method used inside level-set reformulations when f* is unknown, together with the modified Polyak stepsize and Nesterov momentum when f* is known.

If this is right

  • The methods apply directly to feasibility problems and overparameterized models where f* is known.
  • Root-finding subproblems are solved to relative accuracy by first-order methods without losing the overall optimal rate.
  • The same complexity bound holds for convex function-constrained optimization as for problems with simple constraints.
  • No line search or prior structural knowledge is needed to reach the Hölder-optimal oracle complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Implementation in machine learning pipelines could become simpler because no separate estimation of smoothness constants is required.
  • The root-finding reduction suggests that similar inexact fixed-point schemes might produce parameter-free variants for other classes of constrained problems.
  • The approach separates the outer root search from the inner first-order solver, which could allow modular replacement of the inner solver while preserving optimality.

Load-bearing premise

The subproblems can be solved using computational oracles for diagonal quadratic programs and inexact first-order methods to a specified relative accuracy.

What would settle it

A concrete counter-example or numerical test on a Hölder-smooth function-constrained problem in which the observed number of gradient evaluations exceeds the stated O(ε^{-2/(1+3ρ)}) bound once all smoothness and Hölder parameters are hidden from the algorithm.

Figures

Figures reproduced from arXiv: 2412.06319 by Guanghui Lan, Qi Deng, Zhenwei Lin.

Figure 1
Figure 1. Figure 1: Objective value (dK(uk) + dK(sk)) v.s. number of gradient evaluations in the SOCP problem. Left is 500 variables, 200 equality constraints and 10 cones each of dimension 50, right is 1000 variables, 800 equality constraints and 10 cones each of dimension 100 5.1 SOCPs In this section, we solve standard SOCP, which can be equivalently expressed as the following penalty problem [13]: min x dK(u) + dK∗ (s) s.… view at source ↗
Figure 2
Figure 2. Figure 2: Objective value (∥[σmax(I−Xk)]+∥+∥[σmax(A⊤ i Xk+XkA⊤ i )]+∥) v.s. number of gradient evaluations in LMI. Left is LMI problem with parameter (q, k) = (20, 10) and right is LMI problem with parameter (q, k) = (40, 20). 5.2 LMIs The second experiment focuses on a Linear Matrix Inequality (LMI) problem, which seeks a symmetric matrix X ∈ S q×q such that the following inequalities hold: X ⪰ I, A⊤ i X + XAi ⪯ 0,… view at source ↗
Figure 3
Figure 3. Figure 3: Number of gradient evaluations v.s. different [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The y-axis represents the value of max{|f(xk) − f ∗ |/|f ∗ |, [gi(xk)]+} and the x-axis indicates the number of gradient evaluations needed to satisfy the convergence criterion ut ≤ 10−3 . All experiments were conducted with the parameters β = 1, α = 1.365, and ν = 0.9. The graph on the left displays results for a convex QCQP with m = 10 and n = 1000, while the graph on the right shows results for a convex… view at source ↗
Figure 5
Figure 5. Figure 5: The y-axis represents the value of max{|f(xk) − f ∗ |/|f ∗ |, [gi(xk)]+} and the x-axis indicates the number of gradient evaluations or iterations needed to satisfy the convergence criteria ut ≤ 10−3 for our two algorithms ( gradient evaluations in the first graph) and complementary condition smaller than 10−3 for iALM (outer iterations in the second graph and number of gradient evaluations in the last gra… view at source ↗
Figure 6
Figure 6. Figure 6: The y-axis represents the value of max{|f(xk) − f ∗ |/|f ∗ |, [gi(xk)]+} and the x-axis indicates the number of gradient evaluations needed to satisfy the convergence criteria ut ≤ 10−3 for our two algorithms and complementary condition smaller than 10−3 for iALM. IFP: APL-based Inexact Fixed Point Method; TIS: APL-based Truncated Inexact Secant Method. References [1] M. ApS. Mosek optimization toolbox for… view at source ↗
read the original abstract

This paper presents new first-order methods for achieving optimal oracle complexities in convex optimization with convex functional constraints. Oracle complexities are measured by the number of function and gradient evaluations. To achieve this, we enable first-order methods to utilize computational oracles for solving diagonal quadratic programs in subproblems. For problems where the optimal value $f^*$ is known, such as those in overparameterized models and feasibility problems, we propose an accelerated first-order method that incorporates a modified Polyak step size and Nesterov's momentum. Notably, our method does not require knowledge of smoothness levels, H\"{o}lder continuity parameter of the gradient, or additional line search, yet achieves the optimal oracle complexity bound of $\mathcal{O}(\varepsilon^{-2/(1+3\rho)})$ under H\"{o}lder smoothness conditions. When $f^*$ is unknown, we reformulate the problem as finding the root of the optimal value function and develop inexact fixed-point iteration and secant method to compute $f^*$. These root-finding subproblems are solved inexactly using first-order methods to a specified relative accuracy. We employ the accelerated prox-level (APL) method, which is proven to be uniformly optimal for convex optimization with simple constraints. Our analysis demonstrates that APL-based level-set methods also achieve the optimal oracle complexity of $\mathcal{O}(\varepsilon^{-2/(1+3\rho)})$ for convex function-constrained optimization, without requiring knowledge of any problem-specific structures. Through experiments on various tasks, we demonstrate the advantages of our methods over existing approaches in function-constrained optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces uniformly optimal and parameter-free first-order methods for convex optimization with convex functional constraints. When f* is known, it proposes an accelerated method with modified Polyak step size and Nesterov's momentum that achieves the optimal oracle complexity O(ε^{-2/(1+3ρ)}) under Hölder smoothness without knowledge of smoothness levels or ρ. When f* is unknown, the problem is recast as root-finding of the optimal value function; inexact fixed-point iteration and secant methods are combined with the APL method (previously shown optimal for simple constraints) to obtain the same complexity bound. Complexity is measured solely in function/gradient evaluations, while subproblems invoke external computational oracles for diagonal quadratic programs solved to relative accuracy.

Significance. If the central claims hold after clarification of the oracle costs, the work would advance the literature on parameter-free first-order methods by extending the APL framework to functional constraints while preserving optimality and avoiding line searches or parameter tuning. The explicit handling of both known and unknown f* cases, together with the experimental comparisons, would strengthen the practical utility for overparameterized models and feasibility problems.

major comments (2)
  1. [Abstract] Abstract (and the paragraphs describing subproblem oracles): the claim that the methods remain parameter-free and achieve O(ε^{-2/(1+3ρ)}) while counting only function/gradient evaluations is load-bearing on the assumption that the diagonal quadratic program oracles can themselves be realized by parameter-free first-order procedures whose cost does not require knowledge of ρ or additional evaluations; the manuscript provides no explicit construction or complexity accounting for these oracles.
  2. [Abstract] Abstract (description of inexact fixed-point iteration and secant method): the analysis asserts that root-finding subproblems can be solved inexactly to a specified relative accuracy by first-order methods without degrading the overall optimal complexity or the parameter-free property, yet no error-propagation bounds or inner-iteration counts are supplied to confirm that the inexactness preserves the Hölder-smoothness exploitation.
minor comments (1)
  1. Notation for the Hölder parameter ρ and the relative accuracy tolerances in the root-finding procedures could be introduced with a single consolidated table or definition block to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on the abstract and subproblem analysis. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and explicit constructions.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and the paragraphs describing subproblem oracles): the claim that the methods remain parameter-free and achieve O(ε^{-2/(1+3ρ)}) while counting only function/gradient evaluations is load-bearing on the assumption that the diagonal quadratic program oracles can themselves be realized by parameter-free first-order procedures whose cost does not require knowledge of ρ or additional evaluations; the manuscript provides no explicit construction or complexity accounting for these oracles.

    Authors: We agree that the current manuscript lacks an explicit construction and complexity accounting for the diagonal quadratic program oracles. These oracles arise in the level-set subproblems and, owing to their diagonal structure, admit closed-form solutions via coordinate-wise operations that require no knowledge of ρ and incur no additional function/gradient evaluations beyond those already counted. In the revision we will add a dedicated paragraph (and, if needed, a short appendix) supplying this construction together with a complexity argument showing that any first-order realization of the oracles contributes only a constant factor absorbed into the O(ε^{-2/(1+3ρ)}) bound. This will make the parameter-free claim fully rigorous while preserving the stated oracle complexity measured solely in function/gradient evaluations. revision: yes

  2. Referee: [Abstract] Abstract (description of inexact fixed-point iteration and secant method): the analysis asserts that root-finding subproblems can be solved inexactly to a specified relative accuracy by first-order methods without degrading the overall optimal complexity or the parameter-free property, yet no error-propagation bounds or inner-iteration counts are supplied to confirm that the inexactness preserves the Hölder-smoothness exploitation.

    Authors: We acknowledge that the current version does not supply explicit error-propagation bounds or inner-iteration counts for the inexact fixed-point iteration and secant method. The analysis already chooses the relative accuracy of each root-finding subproblem so that the overall rate is preserved, but the propagation details under Hölder smoothness are only sketched. In the revised manuscript we will insert a new lemma (with a short proof) that quantifies the propagation of inexactness, together with explicit inner-iteration bounds showing that the number of additional first-order steps remains compatible with the outer O(ε^{-2/(1+3ρ)}) complexity and does not require knowledge of ρ. This will confirm that both the optimal rate and the parameter-free property are retained. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior APL optimality; derivation remains independent

full rationale

The paper's central claims rest on extensions of the accelerated prox-level (APL) method, explicitly described as 'proven to be uniformly optimal for convex optimization with simple constraints' and employed for level-set methods to achieve O(ε^{-2/(1+3ρ)}) bounds. This constitutes a self-citation to prior work by overlapping authors (Lan et al.), but the citation is not load-bearing in a circular sense: the current paper's contributions (modified Polyak steps, inexact root-finding via first-order methods, and diagonal QP oracles) add independent structure without reducing any prediction or optimality result to a fit or definition internal to this manuscript. No self-definitional equations, fitted inputs renamed as predictions, or ansatz smuggling appear in the provided text. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard domain assumptions in convex optimization and the technical capability to solve certain quadratic subproblems; no free parameters are introduced as the methods are parameter-free.

axioms (3)
  • domain assumption Functions are convex.
    Fundamental to the optimization setting.
  • domain assumption Gradient satisfies Hölder smoothness with parameter ρ.
    Used for the complexity bound.
  • domain assumption Subproblems admit oracles for diagonal quadratic programs.
    Enables the first-order methods as stated in the abstract.

pith-pipeline@v0.9.0 · 5817 in / 1413 out tokens · 28740 ms · 2026-05-23T07:39:38.463501+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Universal and Parameter-free Gradient Sliding for Composite Optimization

    math.OC 2026-03 unverdicted novelty 7.0

    PFUGS is the first parameter-free gradient sliding method for composite convex problems with unknown Hölder and Lipschitz constants, using O((M_ν/ε)^{2/(1+3ν)}) subgradient evaluations of f and O((L/ε)^{1/2}) gradient...

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    M. ApS. Mosek optimization toolbox for matlab.User’s Guide and Reference Manual, Version, 4(1), 2019

  2. [2]

    A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy. Level-set methods for convex optimization.Mathematical Programming, 174:359–390, 2019

  3. [3]

    Ben-Tal and A

    A. Ben-Tal and A. Nemirovski. Non-euclidean restricted memory level method for large-scale convex optimization. Mathematical Programming, 102:407–456, 2005

  4. [4]

    Bolte, T

    J. Bolte, T. P. Nguyen, J. Peypouquet, and B. W. Suter. From error bounds to the complexity of first-order descent methods for convex functions.Mathematical Programming, 165:471–507, 2017

  5. [5]

    D. Boob, Q. Deng, and G. Lan. Stochastic first-order methods for convex and nonconvex functional constrained optimization. Mathematical Programming, pages 1–65, 2022

  6. [6]

    D. Boob, Q. Deng, and G. Lan. Level constrained first order methods for function constrained optimiza- tion. Mathematical Programming, pages 1–61, 2024

  7. [7]

    J. P. Boyd, Stephen. Subgradient methods. Notes for EE364b, Stanford University, Spring 2013–14, 2014

  8. [8]

    J. V. Burke and M. C. Ferris. Weak sharp minima in mathematical programming.SIAM Journal on Control and Optimization, 31(5):1340–1359, 1993

  9. [9]

    Chambolle and T

    A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011

  10. [10]

    Projection-Free Functional Constrained Optimization for Risk Aversion and Sparsity Control

    Y. Cheng, G. Lan, and H. E. Romeijn. Functional constrained optimization for risk aversion and sparsity control. arXiv preprint arXiv:2210.05108, 2022

  11. [11]

    Cotter, H

    A. Cotter, H. Jiang, M. Gupta, S. Wang, T. Narayan, S. You, and K. Sridharan. Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals.Journal of Machine Learning Research, 20(172):1–59, 2019

  12. [12]

    Davis, D

    D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette. Subgradient methods for sharp weakly convex functions.Journal of Optimization Theory and Applications, 179:962–982, 2018

  13. [13]

    Devanathan and S

    N. Devanathan and S. Boyd. Polyak minorant method for convex optimization.Journal of Optimization Theory and Applications, pages 1–20, 2024. 28

  14. [14]

    Guyon, Isabelle, Gunn, Steve, Ben-Hur, Asa, and D. Gideon. Arcene. UCI Machine Learning Repository,

  15. [15]

    DOI: https://doi.org/10.24432/C58P55

  16. [16]

    E. Y. Hamedani and N. S. Aybat. A primal-dual algorithm with line search for general convex-concave saddle point problems.SIAM Journal on Optimization, 31(2):1299–1329, 2021

  17. [17]

    Hazan and S

    E. Hazan and S. Kakade. Revisiting the polyak step size.arXiv preprint arXiv:1905.00313, 2019

  18. [18]

    Jiang and X

    R. Jiang and X. Li. Hölderian error bounds and kurdyka-łojasiewicz inequality for the trust region subproblem. Mathematics of Operations Research, 47(4):3025–3050, 2022

  19. [19]

    Koklu and I

    M. Koklu and I. A. Özkan. Multiclass classification of dry beans using computer vision and machine learning techniques.Comput. Electron. Agric., 174:105507, 2020. URLhttps://api.semanticscholar. org/CorpusID:219762890

  20. [20]

    G. Lan. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Mathematical Programming, 149(1-2):1–45, 2015

  21. [21]

    G. Lan. First-order and Stochastic Optimization Methods for Machine Learning. Springer, 2020

  22. [22]

    Lan and R

    G. Lan and R. D. C. Monteiro. Iteration-complexity of first-order penalty methods for convex program- ming. Mathematical Programming, 138:115–139, 2013

  23. [23]

    Lemaréchal, A

    C. Lemaréchal, A. S. Nemirovski, and Y. E. Nesterov. New variants of bundle methods.Mathematical Programming, 69:111–148, 1995

  24. [24]

    G. Li. Global error bounds for piecewise convex polynomials.Mathematical programming, 137(1):37–64, 2013

  25. [25]

    Li and G

    T. Li and G. Lan. A simple uniformly optimal method without line search for convex optimization. arXiv preprint arXiv:2310.10082, 2023

  26. [26]

    Q. Lin, S. Nadarajah, and N. Soheili. A level-set method for convex optimization with a feasible solution path. SIAM Journal on Optimization, 28(4):3290–3311, 2018

  27. [27]

    Lin and Q

    Z. Lin and Q. Deng. Efficient first-order methods for convex optimization with strongly convex function constraints. arXiv preprint arXiv:2212.11143, 2022

  28. [28]

    Loizou, S

    N. Loizou, S. Vaswani, I. H. Laradji, and S. Lacoste-Julien. Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence. InInternational Conference on Artificial Intelligence and Statistics, pages 1306–1314. PMLR, 2021

  29. [29]

    B. S. Mordukhovich and N. M. Nam.Convex analysis and beyond. Springer, 2022

  30. [30]

    Necoara, Y

    I. Necoara, Y. Nesterov, and F. Glineur. Linear convergence of first order methods for non-strongly convex optimization. Mathematical Programming, 175(1-2):69–107, 2019. ISSN 0025-5610. doi: 10. 1007/s10107-018-1232-1

  31. [31]

    Nesterov

    Y. Nesterov. A method for solving the convex programming problem with convergence rate o (1/k2). In Dokl akad nauk Sssr, volume 269, page 543, 1983

  32. [32]

    Nesterov

    Y. Nesterov. Gradient methods for minimizing composite functions.Mathematical Programming, 140 (1):125–161, 2013. doi: 10.1007/s10107-012-0629-5

  33. [33]

    Nesterov

    Y. Nesterov. Universal gradient methods for convex optimization problems.Mathematical Programming, 152(1):381–404, 2015

  34. [34]

    Nesterov et al.Lectures on convex optimization, volume 137

    Y. Nesterov et al.Lectures on convex optimization, volume 137. Springer, 2018

  35. [35]

    K. F. Ng and X. Y. Zheng. Global error bounds with fractional exponents.Mathematical programming, 88:357–370, 2000. 29

  36. [36]

    B. T. Polyak. Introduction to optimization. 1987

  37. [37]

    Rigollet and X

    P. Rigollet and X. Tong. Neyman-pearson classification, convexity and stochastic constraints.Journal of Machine Learning Research, 2011

  38. [38]

    R. T. Rockafellar and R. J.-B. Wets.Variational analysis, volume 317. Springer Science & Business Media, 2009

  39. [39]

    X. Tong, Y. Feng, and A. Zhao. A survey on neyman-pearson classification and suggestions for future research. Wiley Interdisciplinary Reviews: Computational Statistics, 8(2):64–81, 2016

  40. [40]

    P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 2(3), 2008

  41. [41]

    H. H. Vui. Global holderian error bound for nondegenerate polynomials.SIAM Journal on Optimization, 23(2):917–933, 2013

  42. [42]

    Y. Xu. Iteration complexity of inexact augmented lagrangian methods for constrained convex program- ming. Mathematical Programming, 185:199–244, 2021

  43. [43]

    Zhang and G

    Z. Zhang and G. Lan. Solving convex smooth function constrained optimization is almost as easy as unconstrained optimization. arXiv preprint arXiv:2210.05807, 2022. A Missing Proofs A.1 Proof of Theorem 4.2 Let δs = ¯us − ¯ls, we haveδs+1 ≤ νδs. We shall consider the following two cases. First, suppose V (η) ≥ 1 2 ε. Then, it takes at mostS = max n 0, l l...