Universal and Parameter-free Gradient Sliding for Composite Optimization

Qi Luo; Yan Wu; Yuyuan Ouyang; Zhe Zhang

arxiv: 2603.23492 · v2 · pith:HUQ3PKE2new · submitted 2026-03-24 · 🧮 math.OC

Universal and Parameter-free Gradient Sliding for Composite Optimization

Yan Wu , Yuyuan Ouyang , Zhe Zhang , Qi Luo This is my paper

Pith reviewed 2026-05-21 10:21 UTC · model grok-4.3

classification 🧮 math.OC

keywords composite convex optimizationgradient slidingparameter-free methodsHölder continuous subgradientsLipschitz gradientuniversal algorithmsfirst-order methods

0 comments

The pith

PFUGS solves convex composite optimization to ε-accuracy using O((M_ν/ε)^{2/(1+3ν)}) subgradient calls for f and O((L/ε)^{1/2}) gradient calls for g, with no knowledge of the constants M_ν, ν or L.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a parameter-free gradient sliding method for minimizing the sum of a convex function f whose subgradients are Hölder continuous of order ν and a convex function g whose gradient is Lipschitz continuous. The algorithm produces an ε-approximate minimizer while using only the stated number of first-order oracle calls for each function separately, and it does so without any prior estimates of the Hölder or Lipschitz constants. This removes a practical barrier because earlier gradient-sliding schemes required the user to supply those constants in advance to achieve the same rates. A reader should care because many real composite problems, such as regularized regression or constrained convex programs, have functions whose smoothness parameters are difficult or impossible to compute ahead of time.

Core claim

The PFUGS algorithm computes an ε-approximate solution to the convex composite optimization problem min_{x in X} {f(x) + g(x)} with O((M_ν/ε)^{2/(1+3ν)}) evaluations of (sub)gradients of f and O((L/ε)^{1/2}) evaluations of gradients of g, without prior knowledge of the problem constants M_ν, ν, and L.

What carries the argument

The Parameter-Free Universal Gradient Sliding (PFUGS) algorithm, which combines a universal gradient step for the Hölder part f with a sliding inner loop for the smooth part g and uses observed gradient norms to adapt step sizes on the fly.

If this is right

The algorithm attains the optimal complexity known for the Hölder-smooth term f while treating its parameters as unknown.
It simultaneously attains the optimal complexity known for the smooth term g while treating its Lipschitz constant as unknown.
No separate tuning or restarting phase is required to achieve both rates at once.
The method extends gradient sliding to the setting in which the two summands have genuinely different and a priori unknown smoothness classes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptive sliding structure could be tested on problems with three or more summands by cycling through pairwise slides.
Because the method never needs the constants, it may serve as a drop-in solver for black-box composite problems arising in machine learning where smoothness parameters change with data.
A natural next measurement would be the practical wall-clock time on large-scale instances compared with restarted or tuned baselines.

Load-bearing premise

The functions must obey the given Hölder subgradient condition on f and Lipschitz gradient condition on g, and the adaptive rule inside PFUGS must recover the claimed complexity bounds even though it never receives the numerical values of M_ν, ν, or L.

What would settle it

Implement PFUGS on a simple quadratic or Hölder example where the exact constants M_ν, ν, and L are known in closed form, run it with only gradient oracles and no constant inputs, and verify whether the observed number of calls stays within the stated big-O bounds for a sequence of decreasing ε.

read the original abstract

We propose a Parameter-Free Universal Gradient Sliding (PFUGS) algorithm for computing an approximate solution to the convex composite optimization $\min_{x\in X} \{f(x) + g(x)\}$, where $f$ has $(M_\nu,\nu)$-H\"older continuous subgradient and $g$ has $L$-Lipschitz continuous gradient. PFUGS computes an $\varepsilon$-approximate solution with $\mathcal{O}((M_\nu/\varepsilon)^{{2}/{(1+3\nu)}})$ evaluations of (sub)gradients of $f$ and $\mathcal{O}((L/\varepsilon)^{1/2})$ evaluations of gradients of $g$, without prior knowledge of problem constants. To the best of our knowledge, PFUGS is the first gradient sliding algorithm for problems involving two functions whose distinct problem constants are both unknown a priori.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a parameter-free gradient sliding method for composite convex problems that avoids knowing either the Hölder constant or the smoothness constant ahead of time.

read the letter

The main thing to know is that PFUGS reaches the expected rates for composite optimization without needing estimates for M_ν or L. It uses O((M_ν/ε)^{2/(1+3ν)}) subgradient calls on f and O((L/ε)^{1/2}) gradient calls on g, and the authors position it as the first gradient sliding approach that handles both unknown constants at once in this setting. That matches the abstract claim and seems to follow from applying standard adaptation steps to existing sliding frameworks. The work is straightforward in that sense and addresses a real usability issue in first-order methods where constants are rarely known precisely. The proofs presumably rely on doubling searches or restarts to adjust steps without changing the leading exponents much, which is a common and reliable route. This part holds up as a clean extension rather than a deep theoretical shift. The rates stay the same as the known-constant versions, so the gain is mainly in removing the need for prior knowledge or tuning. No load-bearing contradictions appear in the stated bounds, and the citation pattern builds directly on prior gradient sliding results without obvious gaps. The paper targets readers working on practical convex optimization algorithms, especially those who implement first-order methods on composite problems and want to skip constant estimation. Someone focused on making theory more deployable would get the most out of it. It deserves a serious referee because the claim is concrete, the setting is standard, and the parameter-free property is a useful incremental step worth checking in detail. I would send it for review to verify the adaptation mechanics and any supporting experiments.

Referee Report

0 major / 4 minor

Summary. The manuscript proposes the Parameter-Free Universal Gradient Sliding (PFUGS) algorithm for the convex composite problem min_{x in X} {f(x) + g(x)}, where f satisfies the (M_ν, ν)-Hölder continuous subgradient condition and g has L-Lipschitz continuous gradients. PFUGS is claimed to return an ε-approximate solution using O((M_ν/ε)^{2/(1+3ν)}) (sub)gradient evaluations of f and O((L/ε)^{1/2}) gradient evaluations of g, without any prior knowledge of M_ν or L. The authors state that this is the first gradient-sliding method in which both distinct problem constants are unknown a priori.

Significance. If the stated complexity bounds hold, the result would be a useful practical advance: it removes the need to estimate or tune two separate constants that are often difficult to obtain in applications. The work extends existing gradient-sliding frameworks to a fully parameter-free regime while preserving the optimal exponents for the Hölder and smooth components, which could serve as a template for other composite or multi-function settings.

minor comments (4)

§2.1: the precise statement of the (M_ν, ν)-Hölder condition on the subgradient of f should include the domain restriction and the constant M_ν explicitly, to avoid ambiguity when the algorithm later estimates this quantity.
Algorithm 1 (PFUGS pseudocode): the inner-loop termination criterion and the doubling-search rule for estimating M_ν are presented without an accompanying line-by-line complexity accounting; a short table or remark summarizing the per-phase cost would improve readability.
Theorem 3.1: the proof sketch invokes a standard potential-function argument, but the transition from the estimated constants to the final bound is only outlined; adding one or two intermediate lemmas that bound the number of failed doubling phases would make the argument self-contained.
§4 (numerical experiments): the test problems use synthetic data with known constants; a brief discussion of how the method behaves when the true M_ν and L are replaced by crude upper bounds would strengthen the practical claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. The report correctly summarizes the contributions of PFUGS as the first parameter-free gradient sliding method for composite convex optimization with unknown Hölder and Lipschitz constants. We address the points raised below.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes PFUGS as a parameter-free extension of gradient sliding methods for composite convex optimization under Hölder and Lipschitz assumptions. The claimed complexity bounds are derived from standard adaptive techniques (such as doubling searches or restarts) applied to the distinct constants of f and g, without any reduction of the target rates to fitted parameters, self-definitions, or load-bearing self-citations. The derivation remains self-contained against external benchmarks for gradient sliding and parameter-free adaptation, with no steps that equate outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard domain assumptions for the problem class; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract description.

axioms (1)

domain assumption f has (M_ν, ν)-Hölder continuous subgradient and g has L-Lipschitz continuous gradient.
This defines the problem class for which PFUGS is designed and the complexity is claimed.

pith-pipeline@v0.9.0 · 5678 in / 1251 out tokens · 57096 ms · 2026-05-21T10:21:34.750756+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PFUGS computes an ε-approximate solution with O((M_ν/ε)^{2/(1+3ν)}) evaluations of (sub)gradients of f and O((L/ε)^{1/2}) evaluations of gradients of g, without prior knowledge of problem constants
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the modified subroutine ... updates the total number of iterations on the fly ... η_s^k ∑ 1/p_t,s^k stays invariant

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 3 internal anchors

[1]

Nesterov finds graal: Optimal and adap- tive gradient method for convex optimization.arXiv preprint arXiv:2507.09823, 2025

Ekaterina Borodich and Dmitry Kovalev. Nesterov finds graal: Optimal and adap- tive gradient method for convex optimization.arXiv preprint arXiv:2507.09823, 2025

work page arXiv 2025
[2]

Y. Chen, G. Lan, Y. Ouyang, and W. Zhang. Fast bundle-level methods for un- constrained and ball-constrained convex optimization.Computational Optimization and Applications, 73(1):159–199, 2019

work page 2019
[3]

Acceleration techniques for level bun- dle methods in weakly smooth convex constrained optimization.Computational Optimization and Applications, 77(2):411–432, 2020

Yunmei Chen, Xiaojing Ye, and Wei Zhang. Acceleration techniques for level bun- dle methods in weakly smooth convex constrained optimization.Computational Optimization and Applications, 77(2):411–432, 2020

work page 2020
[4]

Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization

Qi Deng, Guanghui Lan, and Zhenwei Lin. Uniformly optimal and parameter- free first-order methods for convex and function-constrained optimization.arXiv preprint arXiv:2412.06319, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

Vincent Guigues, Jiaming Liang, and Renato D. C. Monteiro. Universal subgradient and proximal bundle methods for convex and strongly convex hybrid composite optimization.Journal of Optimization Theory and Applications, 208:112, 2026. Universal and Parameter-free Gradient Sliding

work page 2026
[6]

Vincent Guigues, Renato D. C. Monteiro, and Benoit Tran. Complexity and numer- ical experiments of a new adaptive generic proximal bundle method.arXiv preprint arXiv:2410.11066, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

A parameter-free conditional gradient method for composite minimization under Hölder condition.Journal of Machine Learning Research, 24(166):1–34, 2023

Masaru Ito, Zhaosong Lu, and Chuan He. A parameter-free conditional gradient method for composite minimization under Hölder condition.Journal of Machine Learning Research, 24(166):1–34, 2023

work page 2023
[8]

G. Lan. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization.Mathematical Programming, 149(1):1–45, 2015

work page 2015
[9]

Gradient sliding for composite optimization.Mathematical Pro- gramming, 159(1):201–235, 2016

Guanghui Lan. Gradient sliding for composite optimization.Mathematical Pro- gramming, 159(1):201–235, 2016

work page 2016
[10]

Springer, Cham, Switzerland, 2020

Guanghui Lan.First-order and stochastic optimization methods for machine learn- ing. Springer, Cham, Switzerland, 2020

work page 2020
[11]

Lan and T

Guanghui Lan and Tianjiao Li. Auto-conditioned primal-dual hybrid gradi- ent method and alternating direction method of multipliers.arXiv preprint arXiv:2410.01979, 2024

work page arXiv 2024
[12]

Projected gradient methods for nonconvex and stochastic smooth optimization: new complexities and auto-conditioned stepsizes

Guanghui Lan, Tianjiao Li, and Yangyang Xu. Projected gradient methods for non- convex and stochastic optimization: new complexities and auto-conditioned step- sizes.arXiv preprint arXiv:2412.14291, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Accelerated gradient sliding for structured convex optimization.Computational Optimization and Applications, 82(2):361–394, 2022

Guanghui Lan and Yuyuan Ouyang. Accelerated gradient sliding for structured convex optimization.Computational Optimization and Applications, 82(2):361–394, 2022

work page 2022
[14]

Optimal and parameter-free gra- dient minimization methods for convex and nonconvex optimization.arXiv preprint arXiv:2310.12139, 2023

Guanghui Lan, Yuyuan Ouyang, and Zhe Zhang. Optimal and parameter-free gra- dient minimization methods for convex and nonconvex optimization.arXiv preprint arXiv:2310.12139, 2023

work page arXiv 2023
[15]

A simple uniformly optimal method without line search for convex optimization.Mathematical Programming, pages 1–38, 2025

Tianjiao Li and Guanghui Lan. A simple uniformly optimal method without line search for convex optimization.Mathematical Programming, pages 1–38, 2025

work page 2025
[16]

Accelerated first-order methods for convex opti- mization with locally lipschitz continuous gradient.SIAM Journal on Optimization, 33(3):2275–2310, 2023

Zhaosong Lu and Sanyou Mei. Accelerated first-order methods for convex opti- mization with locally lipschitz continuous gradient.SIAM Journal on Optimization, 33(3):2275–2310, 2023

work page 2023
[17]

Primal-dual extrapolation methods for monotone inclusions under local lipschitz continuity.Mathematics of Operations Research, 50(4):2577–2599, 2025

Zhaosong Lu and Sanyou Mei. Primal-dual extrapolation methods for monotone inclusions under local lipschitz continuity.Mathematics of Operations Research, 50(4):2577–2599, 2025

work page 2025
[18]

Adaptive gradient descent without descent

Yura Malitsky and Konstantin Mishchenko. Adaptive gradient descent without descent. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 6702–6712. PMLR, 2020. Universal and Parameter-free Gradient Sliding

work page 2020
[19]

Adaptive proximal gradient method for convex optimization

Yura Malitsky and Konstantin Mishchenko. Adaptive proximal gradient method for convex optimization. InAdvances in Neural Information Processing Systems, volume 37, 2024

work page 2024
[20]

Renato D. C. Monteiro, Camilo Ortiz, and Benar F. Svaiter. An adaptive acceler- ated first-order method for convex optimization.Computational Optimization and Applications, 64(1):31–73, 2016

work page 2016
[21]

Renato D. C. Monteiro and Honghao Zhang. Parameter-free proximal bundle meth- ods with adaptive stepsizes for hybrid convex composite optimization problems. arXiv preprint arXiv:2410.20751, 2024

work page arXiv 2024
[22]

Y. E. Nesterov. A method for unconstrained convex minimization problem with the rate of convergenceO(1/k2).Doklady AN SSSR, 269:543–547, 1983. translated as Soviet Math. Docl

work page 1983
[23]

Y. E. Nesterov. Universal gradient methods for convex optimization problems. Mathematical Programming, 152(1-2):381–404, 2015

work page 2015
[24]

Gradient methods for minimizing composite functions.Mathematical Programming, pages 1–37, 2012

Yu Nesterov. Gradient methods for minimizing composite functions.Mathematical Programming, pages 1–37, 2012

work page 2012
[25]

Universal conditional gradient sliding for convex optimization.SIAM Journal on Optimization, 33(4):2962–2987, 2023

Yuyuan Ouyang and Trevor Squires. Universal conditional gradient sliding for convex optimization.SIAM Journal on Optimization, 33(4):2962–2987, 2023

work page 2023
[26]

Proximal algorithms.Foundations and Trends® in Optimization, 1(3):127–239, 2014

Neal Parikh, Stephen Boyd, et al. Proximal algorithms.Foundations and Trends® in Optimization, 1(3):127–239, 2014

work page 2014
[27]

Suh and Shiqian Ma

Jaewook J. Suh and Shiqian Ma. An adaptive and parameter-free nesterov’s accel- erated gradient method for convex optimization.arXiv preprint arXiv:2505.11670, 2025

work page arXiv 2025
[28]

Arnesh Sujanani and Renato D. C. Monteiro. Efficient parameter-free restarted accelerated gradient methods for convex and strongly convex optimization.Journal of Optimization Theory and Applications, 206:52, 2025. Universal and Parameter-free Gradient Sliding A Omitted proofs In this appendix, we present several proofs that were omitted from the main text...

work page 2025

[1] [1]

Nesterov finds graal: Optimal and adap- tive gradient method for convex optimization.arXiv preprint arXiv:2507.09823, 2025

Ekaterina Borodich and Dmitry Kovalev. Nesterov finds graal: Optimal and adap- tive gradient method for convex optimization.arXiv preprint arXiv:2507.09823, 2025

work page arXiv 2025

[2] [2]

Y. Chen, G. Lan, Y. Ouyang, and W. Zhang. Fast bundle-level methods for un- constrained and ball-constrained convex optimization.Computational Optimization and Applications, 73(1):159–199, 2019

work page 2019

[3] [3]

Acceleration techniques for level bun- dle methods in weakly smooth convex constrained optimization.Computational Optimization and Applications, 77(2):411–432, 2020

Yunmei Chen, Xiaojing Ye, and Wei Zhang. Acceleration techniques for level bun- dle methods in weakly smooth convex constrained optimization.Computational Optimization and Applications, 77(2):411–432, 2020

work page 2020

[4] [4]

Uniformly Optimal and Parameter-free First-order Methods for Convex and Function-constrained Optimization

Qi Deng, Guanghui Lan, and Zhenwei Lin. Uniformly optimal and parameter- free first-order methods for convex and function-constrained optimization.arXiv preprint arXiv:2412.06319, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

Vincent Guigues, Jiaming Liang, and Renato D. C. Monteiro. Universal subgradient and proximal bundle methods for convex and strongly convex hybrid composite optimization.Journal of Optimization Theory and Applications, 208:112, 2026. Universal and Parameter-free Gradient Sliding

work page 2026

[6] [6]

Vincent Guigues, Renato D. C. Monteiro, and Benoit Tran. Complexity and numer- ical experiments of a new adaptive generic proximal bundle method.arXiv preprint arXiv:2410.11066, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

A parameter-free conditional gradient method for composite minimization under Hölder condition.Journal of Machine Learning Research, 24(166):1–34, 2023

Masaru Ito, Zhaosong Lu, and Chuan He. A parameter-free conditional gradient method for composite minimization under Hölder condition.Journal of Machine Learning Research, 24(166):1–34, 2023

work page 2023

[8] [8]

G. Lan. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization.Mathematical Programming, 149(1):1–45, 2015

work page 2015

[9] [9]

Gradient sliding for composite optimization.Mathematical Pro- gramming, 159(1):201–235, 2016

Guanghui Lan. Gradient sliding for composite optimization.Mathematical Pro- gramming, 159(1):201–235, 2016

work page 2016

[10] [10]

Springer, Cham, Switzerland, 2020

Guanghui Lan.First-order and stochastic optimization methods for machine learn- ing. Springer, Cham, Switzerland, 2020

work page 2020

[11] [11]

Lan and T

Guanghui Lan and Tianjiao Li. Auto-conditioned primal-dual hybrid gradi- ent method and alternating direction method of multipliers.arXiv preprint arXiv:2410.01979, 2024

work page arXiv 2024

[12] [12]

Projected gradient methods for nonconvex and stochastic smooth optimization: new complexities and auto-conditioned stepsizes

Guanghui Lan, Tianjiao Li, and Yangyang Xu. Projected gradient methods for non- convex and stochastic optimization: new complexities and auto-conditioned step- sizes.arXiv preprint arXiv:2412.14291, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Accelerated gradient sliding for structured convex optimization.Computational Optimization and Applications, 82(2):361–394, 2022

Guanghui Lan and Yuyuan Ouyang. Accelerated gradient sliding for structured convex optimization.Computational Optimization and Applications, 82(2):361–394, 2022

work page 2022

[14] [14]

Optimal and parameter-free gra- dient minimization methods for convex and nonconvex optimization.arXiv preprint arXiv:2310.12139, 2023

Guanghui Lan, Yuyuan Ouyang, and Zhe Zhang. Optimal and parameter-free gra- dient minimization methods for convex and nonconvex optimization.arXiv preprint arXiv:2310.12139, 2023

work page arXiv 2023

[15] [15]

A simple uniformly optimal method without line search for convex optimization.Mathematical Programming, pages 1–38, 2025

Tianjiao Li and Guanghui Lan. A simple uniformly optimal method without line search for convex optimization.Mathematical Programming, pages 1–38, 2025

work page 2025

[16] [16]

Accelerated first-order methods for convex opti- mization with locally lipschitz continuous gradient.SIAM Journal on Optimization, 33(3):2275–2310, 2023

Zhaosong Lu and Sanyou Mei. Accelerated first-order methods for convex opti- mization with locally lipschitz continuous gradient.SIAM Journal on Optimization, 33(3):2275–2310, 2023

work page 2023

[17] [17]

Primal-dual extrapolation methods for monotone inclusions under local lipschitz continuity.Mathematics of Operations Research, 50(4):2577–2599, 2025

Zhaosong Lu and Sanyou Mei. Primal-dual extrapolation methods for monotone inclusions under local lipschitz continuity.Mathematics of Operations Research, 50(4):2577–2599, 2025

work page 2025

[18] [18]

Adaptive gradient descent without descent

Yura Malitsky and Konstantin Mishchenko. Adaptive gradient descent without descent. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 6702–6712. PMLR, 2020. Universal and Parameter-free Gradient Sliding

work page 2020

[19] [19]

Adaptive proximal gradient method for convex optimization

Yura Malitsky and Konstantin Mishchenko. Adaptive proximal gradient method for convex optimization. InAdvances in Neural Information Processing Systems, volume 37, 2024

work page 2024

[20] [20]

Renato D. C. Monteiro, Camilo Ortiz, and Benar F. Svaiter. An adaptive acceler- ated first-order method for convex optimization.Computational Optimization and Applications, 64(1):31–73, 2016

work page 2016

[21] [21]

Renato D. C. Monteiro and Honghao Zhang. Parameter-free proximal bundle meth- ods with adaptive stepsizes for hybrid convex composite optimization problems. arXiv preprint arXiv:2410.20751, 2024

work page arXiv 2024

[22] [22]

Y. E. Nesterov. A method for unconstrained convex minimization problem with the rate of convergenceO(1/k2).Doklady AN SSSR, 269:543–547, 1983. translated as Soviet Math. Docl

work page 1983

[23] [23]

Y. E. Nesterov. Universal gradient methods for convex optimization problems. Mathematical Programming, 152(1-2):381–404, 2015

work page 2015

[24] [24]

Gradient methods for minimizing composite functions.Mathematical Programming, pages 1–37, 2012

Yu Nesterov. Gradient methods for minimizing composite functions.Mathematical Programming, pages 1–37, 2012

work page 2012

[25] [25]

Universal conditional gradient sliding for convex optimization.SIAM Journal on Optimization, 33(4):2962–2987, 2023

Yuyuan Ouyang and Trevor Squires. Universal conditional gradient sliding for convex optimization.SIAM Journal on Optimization, 33(4):2962–2987, 2023

work page 2023

[26] [26]

Proximal algorithms.Foundations and Trends® in Optimization, 1(3):127–239, 2014

Neal Parikh, Stephen Boyd, et al. Proximal algorithms.Foundations and Trends® in Optimization, 1(3):127–239, 2014

work page 2014

[27] [27]

Suh and Shiqian Ma

Jaewook J. Suh and Shiqian Ma. An adaptive and parameter-free nesterov’s accel- erated gradient method for convex optimization.arXiv preprint arXiv:2505.11670, 2025

work page arXiv 2025

[28] [28]

Arnesh Sujanani and Renato D. C. Monteiro. Efficient parameter-free restarted accelerated gradient methods for convex and strongly convex optimization.Journal of Optimization Theory and Applications, 206:52, 2025. Universal and Parameter-free Gradient Sliding A Omitted proofs In this appendix, we present several proofs that were omitted from the main text...

work page 2025