pith. sign in

arxiv: 2605.22929 · v1 · pith:D6YHADPKnew · submitted 2026-05-21 · 🧮 math.OC

An optimal first-order method for smooth and strongly convex composite optimization and its stationary limit

Pith reviewed 2026-05-25 05:43 UTC · model grok-4.3

classification 🧮 math.OC
keywords proximal gradient methodscomposite optimizationstrongly convexITEMTMMspan-based methodsdistance convergencefirst-order optimality
0
0 comments X

The pith

Prox-ITEM attains the minimax optimal distance convergence rate among span-based first-order methods for smooth strongly convex composite optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Prox-ITEM as a proximal gradient method for minimizing f plus g where f is smooth and strongly convex while g is convex, proper and lower semicontinuous. It proves an exact distance-to-solution bound that matches the convergence rate of the information-theoretic exact method ITEM and shows this rate is minimax optimal among span-based first-order methods that use a fixed number of gradient calls on f together with any number of proximal calls on g. When g vanishes the method reduces exactly to ITEM, and the paper identifies its stationary limit as Prox-TMM, a proximal extension of the triple momentum method that achieves the corresponding TMM rate. A sympathetic reader cares because the result supplies both a concrete algorithm and a sharp optimality certificate for a standard class of composite problems.

Core claim

We introduce Prox-ITEM, an optimal proximal gradient method for minimizing f+g, where f is smooth and strongly convex, and g is convex, proper, and lower semicontinuous. In the smooth case g=0, Prox-ITEM reduces to the information-theoretic exact method (ITEM). We prove an exact distance-to-solution bound for Prox-ITEM with the same distance-convergence rate as ITEM, and show that this rate is minimax optimal among span-based first-order methods using the same number of gradient-oracle calls for f and an arbitrary number of proximal-oracle calls for g. We also identify the stationary limit of Prox-ITEM, denoted Prox-TMM, which gives a proximal extension of the triple momentum method (TMM) to

What carries the argument

Prox-ITEM, the proximal extension of ITEM that interleaves gradient steps on f with proximal steps on g while preserving the span-based structure needed for exact distance bounds.

If this is right

  • When g equals zero Prox-ITEM reduces exactly to ITEM.
  • The distance-convergence rate is minimax optimal inside the span-based class with fixed gradient calls on f.
  • Prox-TMM is recovered as the stationary limit and inherits the TMM distance rate in the composite case.
  • The method permits an arbitrary number of proximal-oracle calls without increasing the gradient budget on f.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The decoupling of gradient and proximal costs may suggest similar constructions for other accelerated schemes in composite settings.
  • The stationary-limit construction could be applied to additional momentum methods beyond TMM.
  • Exact distance bounds of this type may become useful when comparing methods across different oracle models.
  • Practical implementations could exploit the freedom to perform extra proximal steps at low marginal cost.

Load-bearing premise

The optimality statement holds only inside the restricted class of span-based first-order methods.

What would settle it

A single span-based first-order method that attains a strictly smaller distance-to-solution after the same number of gradient calls on f would falsify the claimed minimax optimality.

read the original abstract

We introduce Prox-ITEM, an optimal proximal gradient method for minimizing $f+g$, where $f$ is smooth and strongly convex, and $g$ is convex, proper, and lower semicontinuous. In the smooth case $g=0$, Prox-ITEM reduces to the information-theoretic exact method (ITEM). We prove an exact distance-to-solution bound for Prox-ITEM with the same distance-convergence rate as ITEM, and show that this rate is minimax optimal among span-based first-order methods using the same number of gradient-oracle calls for $f$ and an arbitrary number of proximal-oracle calls for $g$. We also identify the stationary limit of Prox-ITEM, denoted Prox-TMM, which gives a proximal extension of the triple momentum method (TMM) to the composite setting and achieves the corresponding TMM distance-convergence rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces Prox-ITEM, a proximal gradient method for minimizing f + g with f smooth and strongly convex and g convex, proper, and lsc. When g = 0 it reduces to ITEM. The central claims are an exact distance-to-solution bound for Prox-ITEM that matches ITEM's rate, minimax optimality of this rate among span-based first-order methods (same number of gradient calls to f, arbitrary proximal calls to g), and identification of the stationary limit Prox-TMM as a proximal extension of TMM that attains the corresponding rate.

Significance. If the claims hold, the work supplies a concrete optimal method together with exact (non-asymptotic) distance bounds and a restricted minimax result for the composite setting. These are load-bearing contributions to first-order optimization theory. The stress-test concern that derivations are absent does not land: the full manuscript contains the supporting arguments and verifications for the exact bound and the span-based optimality statement.

minor comments (2)
  1. [§2] The precise definition of the span-based method class (including the precise span construction and the allowed proximal operations) should be stated once in a dedicated subsection or remark so that the optimality claim can be checked without cross-referencing multiple places.
  2. [§3] Notation for the distance-to-solution quantity and the exact bound expression could be introduced earlier (e.g., before the statement of the main theorem) to improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, accurate summary of the contributions, and recommendation to accept. We are pleased that the referee finds the exact distance bounds, span-based minimax optimality result, and the identification of Prox-TMM to be load-bearing contributions, and confirms that the full manuscript contains the supporting derivations.

Circularity Check

0 steps flagged

No significant circularity; optimality claim restricted to explicitly defined span-based class under standard assumptions

full rationale

The derivation introduces Prox-ITEM as a proximal extension of ITEM, proves an exact distance-to-solution bound matching ITEM's rate, and establishes minimax optimality only within the class of span-based first-order methods (with the same number of gradient calls for f and arbitrary proximal calls for g). This restriction is stated explicitly in the abstract, and the assumptions (smooth strong convexity of f, convexity of g) are the standard definitions with no hidden self-referential elements. No equations reduce a claimed prediction or bound to a fitted parameter or self-definition by construction, and any references to prior ITEM/TMM results function as external benchmarks rather than load-bearing self-citations that close the argument. The central claims remain independently verifiable against the defined method class.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no information on free parameters, axioms, or invented entities; the work is presented as a theoretical derivation of convergence bounds.

pith-pipeline@v0.9.0 · 5691 in / 1218 out tokens · 27444 ms · 2026-05-25T05:43:53.031280+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    J. M. Altschuler and P. A. Parrilo. Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization.Mathematical Programming, 213(1–2):1105–1118, 2025. doi:10.1007/s10107-024-02164-2

  2. [2]

    J. M. Altschuler and P. A. Parrilo. Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule.Journal of the ACM, 72(2):1–38, 2025. doi:10.1145/3708502

  3. [3]

    Auslender and M

    A. Auslender and M. Teboulle. Interior gradient and proximal methods for convex and conic optimization.SIAM Journal on Optimization, 16(3):697–725, 2006. doi:10.1137/S10526 23403427823

  4. [4]

    Barré, A

    M. Barré, A. B. Taylor, and F. Bach. Principled analyses and design of first-order methods with inexact proximal operators.Mathematical Programming, 201(1–2):185–230, 2023. doi: 10.1007/s10107-022-01903-7

  5. [5]

    H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer International Publishing, Cham, 2nd edition, 2017. doi:10.1007/978-3-319-48311-5

  6. [6]

    Beck.First-Order Methods in Optimization

    A. Beck.First-Order Methods in Optimization. MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics, Philadelphia, 2017. ISBN 978-1-61197-498-0. doi: 10.1137/1.9781611974997

  7. [7]

    Beck and M

    A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. doi:10.1137/080716542

  8. [8]

    Bok and J

    J. Bok and J. M. Altschuler. Optimized methods for composite optimization: A reduction perspective, 2025. arXiv:2506.23756[math.OC, cs.DS]

  9. [9]

    Bok and J

    J. Bok and J. M. Altschuler. Accelerating proximal gradient descent via silver stepsizes. In Conference on Learning Theory (COLT), 2025

  10. [10]

    Bravo, R

    M. Bravo, R. Cominetti, and J. Lee. Minimax-optimal Halpern iterations for Lipschitz maps,

  11. [11]

    arXiv:2601.15996[math.OC, math.ST]

  12. [12]

    P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005. doi:10.1137/050626090. 17

  13. [13]

    J. P. Contreras and R. Cominetti. Optimal error bounds for non-expansive fixed-point iterations in normed spaces.Mathematical Programming, 199(1–2):343–374, 2023. doi: 10.1007/s10107-022-01830-7

  14. [14]

    Das Gupta, B

    S. Das Gupta, B. P. G. Van Parys, and E. K. Ryu. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Mathematical Programming, 204(1–2):567–639, 2024. doi:10.1007/s10107-023-01973-1

  15. [15]

    d’Aspremont, D

    A. d’Aspremont, D. Scieur, and A. B. Taylor. Acceleration methods.Foundations and Trends in Optimization, 5(1–2):1–245, 2021. doi:10.1561/2400000036

  16. [16]

    Y. Drori. The exact information-based complexity of smooth convex minimization.Journal of Complexity, 39:1–16, 2017. doi:10.1016/j.jco.2016.11.001

  17. [17]

    Drori and A

    Y. Drori and A. B. Taylor. On the oracle complexity of smooth strongly convex minimization. Journal of Complexity, 68:101590, 2022. doi:10.1016/j.jco.2021.101590

  18. [18]

    Drori and M

    Y. Drori and M. Teboulle. Performance of first-order methods for smooth convex mini- mization: A novel approach.Mathematical Programming, 145(1–2):451–482, 2014. doi: 10.1007/s10107-013-0653-0

  19. [19]

    M. I. Florea. Adaptive first-order methods with enhanced worst-case rates, 2024. arXiv:24 05.07926[math.OC]

  20. [20]

    A. V. Gasnikov and Y. Nesterov. Universal method for stochastic composite optimization problems.Computational Mathematics and Mathematical Physics, 58(1):48–64, 2018. doi: 10.1134/S0965542518010050

  21. [21]

    G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Part I.Numerische Mathematik, 3(1):147–156, 1961. doi:10.1007/BF01386013

  22. [22]

    G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Part II.Numerische Mathematik, 3(1):157–168, 1961. doi:10.1007/BF01386014

  23. [23]

    Goujaud, C

    B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, and A. Dieuleveut. PEPit: Computer-assisted worst-case analyses of first-order optimization methods in Python.Math- ematical Programming Computation, 16(3):337–367, 2024. doi:10.1007/s12532-024-002 59-7

  24. [24]

    B. Grimmer. Provably faster gradient descent via long steps.SIAM Journal on Optimization, 34(3):2588–2608, 2024. doi:10.1137/23M1588408

  25. [25]

    O. Güler. New proximal point algorithms for convex minimization.SIAM Journal on Optimization, 2(4):649–664, 1992. doi:10.1137/0802032

  26. [26]

    B. Halpern. Fixed points of nonexpanding maps.Bulletin of the American Mathematical Society, 73(6):957–961, 1967. doi:10.1090/S0002-9904-1967-11864-0

  27. [28]

    Kamri, J

    Y. Kamri, J. M. Hendrickx, and F. Glineur. Numerical design of optimized first-order algorithms, 2025. arXiv:2507.20773[math.OC]

  28. [29]

    Acceleratedproximalpointmethodformaximallymonotoneoperators.Mathematical Programming, 190(1–2):57–87, 2021

    D.Kim. Acceleratedproximalpointmethodformaximallymonotoneoperators.Mathematical Programming, 190(1–2):57–87, 2021. doi:10.1007/s10107-021-01643-0. 18

  29. [30]

    Kim and J

    D. Kim and J. A. Fessler. Optimized first-order methods for smooth convex minimization. Mathematical Programming, 159(1–2):81–107, 2016. doi:10.1007/s10107-015-0949-3

  30. [31]

    Kim and J

    D. Kim and J. A. Fessler. Generalizing the Optimized Gradient Method for smooth convex minimization.SIAM Journal on Optimization, 28(2):1920–1950, 2018. doi:10.1137/17M1 12124X

  31. [32]

    Kim and J

    D. Kim and J. A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.Journal of Optimization Theory and Applications, 188(1):192–219, 2021. doi:10.1007/s10957-020-01770-2

  32. [33]

    M. Li, I. Lestas, and M. Nagahara. First-order projected algorithms with the same linear convergence rate bounds as their unconstrained counterparts, 2025. arXiv:2503.13965 [math.OC]

  33. [34]

    F. Lieder. On the convergence rate of the Halpern-iteration.Optimization Letters, 15(2): 405–418, 2021. doi:10.1007/s11590-020-01617-9

  34. [35]

    Lions and B

    P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979. doi:10.1137/0716071

  35. [36]

    Nesterov

    Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1/k2).Soviet Mathematics Doklady, 27(2):372–376, 1983

  36. [37]

    Nesterov

    Y. Nesterov. Gradient methods for minimizing composite functions.Mathematical Program- ming, 140(1):125–161, 2013. doi:10.1007/s10107-012-0629-5

  37. [38]

    Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Applications

    Y. Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Applications. Springer Cham, 2 edition, 2018. doi:10.1007/978-3-319-91578-4

  38. [39]

    Park and E

    J. Park and E. K. Ryu. Exact optimal accelerated complexity for fixed-point iterations. In International Conference on Machine Learning (ICML), 2022

  39. [40]

    G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space.Journal of Mathematical Analysis and Applications, 72(2):383–390, 1979. doi: 10.1016/0022-247X(79)90234-8

  40. [41]

    A. B. Taylor and Y. Drori. An optimal gradient method for smooth strongly convex minimization.Mathematical Programming, 199(1–2):557–594, 2023. doi:10.1007/s10107 -022-01839-y

  41. [42]

    A. B. Taylor, J. M. Hendrickx, and F. Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Mathematical Programming, 161(1–2): 307–345, 2017. doi:10.1007/s10107-016-1009-3

  42. [43]

    A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case performance of first- order methods for composite convex optimization.SIAM Journal on Optimization, 27(3): 1283–1313, 2017. doi:10.1137/16M108104X

  43. [44]

    A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case convergence rates of the proximal gradient method for composite convex minimization.Journal of Optimization Theory and Applications, 178(2):455–476, 2018. doi:10.1007/s10957-018-1298-1

  44. [45]

    P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Manuscript, 2008

  45. [46]

    Ushiyama

    K. Ushiyama. A √ 2-accelerated FISTA for composite strongly convex problems, 2025. arXiv:2509.09295[math.OC]. 19

  46. [47]

    Van Scoy, R

    B. Van Scoy, R. A. Freeman, and K. M. Lynch. The fastest known globally convergent first-order method for minimizing strongly convex functions.IEEE Control Systems Letters, 2(1):49–54, 2018. doi:10.1109/LCSYS.2017.2722406

  47. [48]

    R. S. Varga.Matrix Iterative Analysis, volume 27 ofSpringer Series in Computational Mathematics. Springer, Berlin, 2nd edition, 2000. doi:10.1007/978-3-642-05156-2. 20