pith. sign in

arxiv: 2606.28278 · v1 · pith:XZIDHZEInew · submitted 2026-06-26 · 🧮 math.OC

Sharp First-Order Lower Bounds under Sublevel α-Polyak-Lojasiewicz Conditions

Pith reviewed 2026-06-29 02:37 UTC · model grok-4.3

classification 🧮 math.OC
keywords first-order methodsPolyak-Lojasiewicz conditionoptimization complexitylower boundsgradient descentstochastic gradient descentsublevel setssmooth optimization
0
0 comments X

The pith

Any first-order method requires Omega(L tau to the 2 over alpha times epsilon to the power of minus (2 minus alpha) over alpha) queries under the globally smooth sublevel-alpha-PL condition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes sharp lower bounds showing that standard first-order methods cannot be improved upon for reaching epsilon accuracy on functions satisfying a relaxed alpha-Polyak-Lojasiewicz inequality. This inequality holds only inside the initial sublevel set while the function stays globally smooth. A reader would care because the bounds match the known performance of gradient descent in the deterministic case and of stochastic gradient descent when noise dominates, proving these algorithms are optimal inside the class. The work also rules out a stronger global version of the condition for non-constant functions when alpha is less than 2.

Core claim

Under the globally smooth sublevel-alpha-Polyak-Lojasiewicz class, any first-order method requires Omega(L tau^{2/alpha} epsilon^{-(2-alpha)/alpha}) queries in the deterministic oracle model and Omega(L sigma^2 tau^{4/alpha} epsilon^{-(4-alpha)/alpha}) queries in the noise-dominated stochastic model. These lower bounds match the upper rates achieved by gradient descent and SGD respectively.

What carries the argument

The sublevel-alpha-Polyak-Lojasiewicz condition, which bounds the suboptimality gap by a power alpha of the gradient norm only on the initial sublevel set.

If this is right

  • Gradient descent achieves the optimal query complexity for deterministic first-order methods on this function class.
  • Stochastic gradient descent achieves the optimal query complexity in the noise-dominated regime on this function class.
  • No first-order method can asymptotically improve on these rates without further assumptions on the function.
  • The global alpha-PL condition with alpha less than 2 is incompatible with global L-smoothness for any non-constant function.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The scaling with alpha suggests that functions near degenerate minima inherit complexity that interpolates between the Holder error-bound case and the classical PL case.
  • Similar lower-bound constructions may apply to other oracles such as higher-order or inexact first-order models.
  • Adaptive algorithms that estimate the local alpha from observed gradient norms could approach these rates without knowing alpha in advance.

Load-bearing premise

The alpha-PL inequality holds only on the initial sublevel set while the function remains globally L-smooth.

What would settle it

A concrete function in the sublevel-alpha-PL class together with a first-order method that reaches epsilon accuracy using asymptotically fewer queries than the stated lower bound.

read the original abstract

We study the optimal complexity of first-order methods under the $\alpha$-Polyak-Lojasiewicz condition with $\alpha\in[1,2)$. This condition bounds the suboptimality gap by a power $\alpha$ of the gradient norm; $\alpha=2$ recovers the classical Polyak-Lojasiewicz condition, $\alpha=1$ corresponds to a Holder error-bound regime, and intermediate exponents arise near degenerate minima in local Kurdyka-Lojasiewicz geometry. We first prove a structural obstruction: if global smoothness and a global $\alpha$-Polyak-Lojasiewicz inequality are imposed on $\mathbb{R}^d$, then every such function is constant for $\alpha<2$. This motivates the globally smooth, sublevel-$\alpha$-Polyak-Lojasiewicz class, where the inequality is required only on the initial sublevel set. On this class, we prove sharp minimax lower bounds for first-order methods. In the deterministic oracle model, any first-order method requires $\Omega(L\tau^{2/\alpha}\epsilon^{-(2-\alpha)/\alpha})$ queries to reach accuracy $\epsilon$, matching gradient descent. In the bounded-variance stochastic-gradient oracle model, any stochastic first-order method requires $\Omega(L\sigma^2\tau^{4/\alpha}\epsilon^{-(4-\alpha)/\alpha})$ queries in the noise-dominated regime, matching known SGD upper rates under trajectory-containment assumptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proves that imposing both global L-smoothness and a global α-Polyak-Łojasiewicz inequality on R^d forces every function to be constant when α<2. It therefore restricts attention to the globally L-smooth, sublevel-α-PL class (α∈[1,2)), where the inequality holds only on the initial sublevel set. On this class the authors establish matching minimax lower bounds: any deterministic first-order method requires Ω(L τ^{2/α} ε^{-(2-α)/α}) queries, and any bounded-variance stochastic first-order method requires Ω(L σ² τ^{4/α} ε^{-(4-α)/α}) queries in the noise-dominated regime. Both bounds are claimed to be tight with the known upper bounds of gradient descent and SGD under trajectory containment.

Significance. If the derivations hold, the work supplies the first sharp lower bounds for the intermediate α-PL regime that appears in local Kurdyka-Łojasiewicz geometry. The explicit obstruction result cleanly motivates the modeling choice, and the matching rates confirm optimality of standard first-order methods on this class. The deterministic and stochastic bounds are internally consistent with the descent lemma plus the given PL relation.

major comments (2)
  1. [Definition of sublevel-α-PL class] The abstract states that the lower bounds are proved on the sublevel-α-PL class, but the precise statement of the class (Definition 2.3 or equivalent) must be checked to confirm that the α-PL inequality is required only on the initial sublevel set while global L-smoothness is retained everywhere; any relaxation of the sublevel restriction would re-introduce the constant-function obstruction.
  2. [Deterministic lower bound construction] §4 (deterministic lower bound): the construction that achieves Ω(L τ^{2/α} ε^{-(2-α)/α}) must be verified to remain inside the initial sublevel set for all iterates of the adversary; if the hard instance escapes the sublevel set, the lower bound would not apply to the stated function class.
minor comments (2)
  1. [Stochastic lower bound] The noise-dominated regime in the stochastic bound should be defined explicitly (e.g., the precise relation between σ, L, τ, and ε that makes the variance term dominate).
  2. [Notation] Notation for the initial sublevel set radius τ is used without an explicit definition in the abstract; a short sentence in §2 would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and the recommendation of minor revision. The comments focus on verifying the precise definition of the function class and the validity of the lower bound construction with respect to the sublevel set. We address each point below.

read point-by-point responses
  1. Referee: [Definition of sublevel-α-PL class] The abstract states that the lower bounds are proved on the sublevel-α-PL class, but the precise statement of the class (Definition 2.3 or equivalent) must be checked to confirm that the α-PL inequality is required only on the initial sublevel set while global L-smoothness is retained everywhere; any relaxation of the sublevel restriction would re-introduce the constant-function obstruction.

    Authors: We confirm that Definition 2.3 defines the sublevel-α-PL class by requiring the α-Polyak-Łojasiewicz inequality to hold solely on the initial sublevel set {x : f(x) ≤ f(x_0)}, while L-smoothness is imposed globally on ℝ^d. This restriction is chosen precisely to circumvent the constant-function obstruction established in Section 3 for the global case when α < 2. revision: no

  2. Referee: [Deterministic lower bound construction] §4 (deterministic lower bound): the construction that achieves Ω(L τ^{2/α} ε^{-(2-α)/α}) must be verified to remain inside the initial sublevel set for all iterates of the adversary; if the hard instance escapes the sublevel set, the lower bound would not apply to the stated function class.

    Authors: The hard instance in Section 4 is constructed so that the initial sublevel set is forward-invariant under any sequence of first-order oracle responses. The proof of the lower bound explicitly verifies that all iterates generated by the adversary remain inside this set, ensuring the instance belongs to the sublevel-α-PL class throughout the trajectory. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation begins with a structural obstruction proof (global smoothness + global α-PL for α<2 forces constant functions) that directly motivates the sublevel restriction; this is an independent mathematical fact, not a self-definition. Lower bounds are then obtained via standard minimax constructions in deterministic and bounded-variance stochastic oracle models, with rates shown to match existing GD/SGD upper bounds under trajectory containment. No parameter is fitted to data and then renamed a prediction, no load-bearing step reduces to a self-citation chain, and the argument remains self-contained against external oracle-model benchmarks without renaming known results or smuggling ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard definition of L-smoothness, the deterministic first-order oracle, and the bounded-variance stochastic-gradient oracle; these are domain assumptions rather than new free parameters or invented entities.

axioms (2)
  • domain assumption Functions are globally L-smooth and the alpha-PL inequality holds on the initial sublevel set
    Invoked to define the function class on which the lower bounds apply and to avoid the constant-function obstruction.
  • domain assumption Standard deterministic and bounded-variance stochastic first-order oracle models
    Used to state the query complexity in both regimes.

pith-pipeline@v0.9.1-grok · 5794 in / 1347 out tokens · 44077 ms · 2026-06-29T02:37:10.813730+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 1 linked inside Pith

  1. [1]

    Schmidt , title =

    Hamed Karimi and Julie Nutini and Mark W. Schmidt , title =. Machine Learning and Knowledge Discovery in Databases (

  2. [2]

    Optimization Letters , volume =

    Hadi Abbaszadehpeivasti and Etienne de Klerk and Moslem Zamani , title =. Optimization Letters , volume =

  3. [3]

    Advances in Neural Information Processing Systems , volume =

    Ilyas Fatkhullin and Jalal Etesami and Niao He and Negar Kiyavash , title =. Advances in Neural Information Processing Systems , volume =

  4. [4]

    Proceedings of the Thirty Sixth Conference on Learning Theory , series =

    Pengyun Yue and Cong Fang and Zhouchen Lin , title =. Proceedings of the Thirty Sixth Conference on Learning Theory , series =

  5. [5]

    Duchi and Oliver Hinder and Aaron Sidford , title =

    Yair Carmon and John C. Duchi and Oliver Hinder and Aaron Sidford , title =. Mathematical Programming , volume =

  6. [6]

    Duchi and Dylan J

    Yossi Arjevani and Yair Carmon and John C. Duchi and Dylan J. Foster and Nathan Srebro and Blake Woodworth , title =. Mathematical Programming , volume =

  7. [7]

    SIAM Journal on Optimization , volume =

    Arkadi Nemirovski and Anatoli Juditsky and Guanghui Lan and Alexander Shapiro , title =. SIAM Journal on Optimization , volume =

  8. [8]

    Bartlett and Pradeep Ravikumar and Martin J

    Alekh Agarwal and Peter L. Bartlett and Pradeep Ravikumar and Martin J. Wainwright , title =. IEEE Transactions on Information Theory , volume =

  9. [9]

    Yurii Nesterov , title =

  10. [10]

    Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (

    Saeed Masiha and Saber Salehkaleybar and Niao He and Negar Kiyavash and Patrick Thiran , title =. Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (

  11. [11]

    Une propri

    Stanis. Une propri. Les

  12. [12]

    Polyak , title =

    Boris T. Polyak , title =. USSR Computational Mathematics and Mathematical Physics , volume =

  13. [13]

    Annales de l'Institut Fourier , volume =

    Krzysztof Kurdyka , title =. Annales de l'Institut Fourier , volume =

  14. [14]

    J. The. SIAM Journal on Optimization , volume =

  15. [15]

    Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the

    Hedy Attouch and J. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the. Mathematics of Operations Research , volume =

  16. [16]

    Foundations of Computational Mathematics , volume =

    Guoyin Li and Ting Kei Pong , title =. Foundations of Computational Mathematics , volume =

  17. [17]

    Applied and Computational Harmonic Analysis , volume =

    Chaoyue Liu and Libin Zhu and Mikhail Belkin , title =. Applied and Computational Harmonic Analysis , volume =

  18. [18]

    Advances in Neural Information Processing Systems , volume =

    Saeed Masiha and Saber Salehkaleybar and Niao He and Negar Kiyavash , title =. Advances in Neural Information Processing Systems , volume =

  19. [19]

    Nemirovsky and David B

    Arkadi S. Nemirovsky and David B. Yudin , title =

  20. [20]

    Convex Optimization: Algorithms and Complexity , journal =

    S. Convex Optimization: Algorithms and Complexity , journal =

  21. [21]

    Lewis , title =

    Dmitriy Drusvyatskiy and Adrian S. Lewis , title =. Mathematics of Operations Research , volume =

  22. [22]

    arXiv preprint arXiv:2303.00096 , year =

    Quentin Rebjock and Nicolas Boumal , title =. arXiv preprint arXiv:2303.00096 , year =

  23. [23]

    Proximal alternating linearized minimization for nonconvex and nonsmooth problems , journal =

    J. Proximal alternating linearized minimization for nonconvex and nonsmooth problems , journal =

  24. [24]

    Journal of Optimization Theory and Applications , volume =

    Pierre Frankel and Guillaume Garrigos and Juan Peypouquet , title =. Journal of Optimization Theory and Applications , volume =

  25. [25]

    Nguyen and Jie Liu and Katya Scheinberg and Martin Tak

    Lam M. Nguyen and Jie Liu and Katya Scheinberg and Martin Tak. Proceedings of the 34th International Conference on Machine Learning (

  26. [26]

    Reddi and Ahmed Hefny and Suvrit Sra and Barnab

    Sashank J. Reddi and Ahmed Hefny and Suvrit Sra and Barnab. Stochastic Variance Reduction for Nonconvex Optimization , booktitle =

  27. [27]

    Advances in Neural Information Processing Systems (

    Cong Fang and Chris Junchi Li and Zhouchen Lin and Tong Zhang , title =. Advances in Neural Information Processing Systems (

  28. [28]

    arXiv preprint arXiv:1811.02564 , year =

    Raef Bassily and Mikhail Belkin and Siyuan Ma , title =. arXiv preprint arXiv:1811.02564 , year =

  29. [29]

    Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (

    Sharan Vaswani and Francis Bach and Mark Schmidt , title =. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (

  30. [30]

    Better Theory for

    Ahmed Khaled and Peter Richt. Better Theory for. Transactions on Machine Learning Research , year =

  31. [31]

    Coddington and Norman Levinson , title =

    Earl A. Coddington and Norman Levinson , title =