The Golden Ratio Proximal ADMM with Norm Independent Step-Sizes for Separable Convex Optimization

Santanu Soe; V. Vetrivel

arxiv: 2510.05779 · v2 · submitted 2025-10-07 · 🧮 math.OC

The Golden Ratio Proximal ADMM with Norm Independent Step-Sizes for Separable Convex Optimization

Santanu Soe , V. Vetrivel This is my paper

Pith reviewed 2026-05-18 09:12 UTC · model grok-4.3

classification 🧮 math.OC

keywords golden ratio proximal ADMMstep-size strategiesnorm-independent stepsseparable convex optimizationglobal convergencesublinear ratesalternating direction method

0 comments

The pith

Two new step-size rules let golden-ratio proximal ADMM solve separable convex problems without operator norm estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops two step-size selection strategies for the golden ratio proximal ADMM that solve linearly constrained separable convex optimization problems. The strategies use only cheap local information from the current iterate and require no backtracking or explicit norm computations. The second strategy can adapt by increasing the step-size and recovering from bad initial values. Global convergence of the iterates is proved along with sublinear rates under standard assumptions, and pointwise rates are obtained as well. The first rule reduces exactly to the fixed step-size version if the starting step-size is chosen small enough.

Core claim

The core discovery is that inexpensive local information suffices to choose step-sizes for the golden ratio proximal ADMM, yielding globally convergent algorithms with sublinear convergence rates for separable convex optimization problems with linear constraints.

What carries the argument

The golden ratio proximal ADMM equipped with the two norm-independent step-size strategies based on local iterate data.

If this is right

The generated iterates converge globally to an optimal solution.
Sublinear convergence rates are established for both proposed algorithms.
Pointwise convergence rates for the iterates are derived.
The first step-size rule coincides with a fixed-step-size method when the initial step-size is below a certain threshold.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These local step-size choices could lower the setup cost for large-scale problems where computing global operator norms is expensive.
The ability to increase step-sizes over time may lead to faster practical convergence in applications with varying curvature.
Similar adaptive rules might be developed for other splitting methods that currently depend on norm bounds.

Load-bearing premise

The problem data satisfy the convexity and constraint conditions assumed in the standard convergence theory for proximal ADMM.

What would settle it

Running the algorithms on a simple convex optimization problem with known solution and observing that the iterates do not approach the solution at the claimed rate would falsify the convergence results.

Figures

Figures reproduced from arXiv: 2510.05779 by Santanu Soe, V. Vetrivel.

**Figure 2.** Figure 2: ROF denoising reconstruction by different algorit [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Convergence plot for different methods: (a) relati [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: TV-regularized image deblurring recovered by fou [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Convergence plots for the image deblurring experi [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence plot summaries for the unbalanced opt [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Transport plans as heatmaps (rows: sources, colum [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

read the original abstract

In this work, we propose two step-size strategies for the Golden ratio proximal ADMM (GrpADMM) to solve linearly constrained separable convex optimization problems. Both strategies eliminate explicit operator norm estimates by relying on inexpensive local information computed at the current iterate and requiring no backtracking. However, the key difference is that the second step-size strategy allows recovery from poor initial steps and can increase from iteration to iteration. Under standard assumptions, we establish global convergence of the generated iterates and derive sublinear convergence rates for both algorithms. We also obtain pointwise convergence rate results for the iterates of the algorithms. In addition, we show that the first proposed step-size rule for GrpADMM reduces to the fixed-step-size counterpart when the initial step-size is chosen below a certain threshold. Preliminary numerical experiments demonstrate the practical adaptability and effectiveness of the proposed approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Two local step-size rules for golden-ratio proximal ADMM that skip operator-norm estimates, with one allowing upward adaptation and standard convergence proofs.

read the letter

The punchline is that this paper gives two concrete step-size rules for golden-ratio proximal ADMM on linearly constrained separable convex problems. Both avoid any explicit operator-norm calculation by using only inexpensive quantities computed from the current iterate, and neither uses backtracking. The second rule can raise the step size from one iteration to the next, which is meant to recover from a bad initial choice. They also show that the first rule collapses to ordinary fixed-step ADMM when the starting value is small enough. Under the usual convexity and qualification assumptions they prove global convergence plus sublinear rates on the objective and feasibility, plus pointwise rates on the iterates. Preliminary numerics are included to illustrate the adaptation in practice. That is the actual new piece relative to earlier GrpADMM work: the norm-free local rules and the non-decreasing option. The analysis follows the standard Lyapunov-style arguments for proximal ADMM variants, so the claims look plausible once the assumptions are granted. The reduction to fixed-step is a useful sanity check rather than a deep result. The numerics are labeled preliminary and do not appear to be a large-scale comparison against norm-based or backtracking baselines, which keeps the practical gain modest for now. No circularity or self-fitting issues show up in the stated claims. This is for people who actually code ADMM solvers for separable convex problems and want to sidestep the norm-estimation step. A reader who cares about implementable step-size selection in proximal methods will find the explicit rules and the convergence statements useful. It is solid enough incremental work with proofs and some experiments to merit a serious referee, even if the gains are practical rather than foundational. I would send it out for review.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes two step-size strategies for the Golden ratio proximal ADMM (GrpADMM) to solve linearly constrained separable convex optimization problems. Both strategies use inexpensive local information at the current iterate to avoid explicit operator norm estimates and do not require backtracking. The second strategy allows the step-size to increase. Under standard assumptions, global convergence and sublinear convergence rates are established for both, along with pointwise convergence rates. The first strategy reduces to fixed-step ADMM below a certain initial step-size threshold. Preliminary numerical experiments are provided to demonstrate effectiveness.

Significance. If the convergence analysis holds, this contribution is significant for practical implementation of ADMM-type methods in large-scale optimization, as it removes the need for computing operator norms which can be costly or difficult. The local information-based rules and the increasing step-size option enhance adaptability. Credit is due for deriving both sublinear and pointwise rates, and for showing the reduction to the fixed-step case. This could be useful for practitioners dealing with separable convex problems.

major comments (1)

[Convergence analysis section] The global convergence and rate proofs rely on the proposed local step-size rules satisfying the standard descent/Lyapunov inequality from proximal ADMM analyses. Explicitly confirm in the relevant theorem (around the statement of the two algorithms) that the rules meet this condition for all iterates without requiring additional boundedness or hidden assumptions on the local information.

minor comments (3)

[Algorithm descriptions] Clarify the exact form of the local information used in the step-size rules (e.g., which norm or inner product appears) to make the algorithms fully reproducible from the pseudocode alone.
[Numerical experiments] In the numerical section, add a brief discussion of how the increasing step-size rule recovers from poor initial choices, with reference to specific iteration counts or objective values in the tables/figures.
[Main results] The abstract states that one rule 'reduces to the fixed-step-size counterpart'; include a short remark or corollary in the main text showing the threshold condition explicitly.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and constructive comment. We address the major comment below.

read point-by-point responses

Referee: [Convergence analysis section] The global convergence and rate proofs rely on the proposed local step-size rules satisfying the standard descent/Lyapunov inequality from proximal ADMM analyses. Explicitly confirm in the relevant theorem (around the statement of the two algorithms) that the rules meet this condition for all iterates without requiring additional boundedness or hidden assumptions on the local information.

Authors: We appreciate the suggestion to make this verification more explicit. In the revised manuscript we will add a dedicated remark (or short lemma) immediately after the statements of the two algorithms and prior to the main convergence theorems. The remark will confirm that both proposed local step-size rules satisfy the standard descent/Lyapunov inequality at every iterate. This follows directly from the definition of the rules, which select the step-size using only finite, well-defined local quantities (e.g., residual norms or inner products evaluated at the current iterate) and requires no additional boundedness assumptions on the operators, the local information, or the sequence of iterates beyond the standard convexity, closedness, and constraint-qualification assumptions already stated in the paper. The existing proofs already proceed by direct substitution of these local choices into the Lyapunov function; the added remark will simply isolate and highlight this verification for the reader. revision: yes

Circularity Check

0 steps flagged

Standard ADMM convergence analysis; no load-bearing reduction to self-fit or self-citation

full rationale

The paper introduces two norm-independent step-size rules for the existing GrpADMM framework and proves global convergence plus sublinear/pointwise rates under standard convex optimization assumptions. These proofs follow the usual Lyapunov descent arguments for proximal ADMM variants; the step-size choices are shown to satisfy the required inequalities directly from local iterate information without backtracking or operator norms. No quoted derivation reduces a claimed rate or convergence result to a fitted parameter or to a prior self-citation by construction. The analysis is self-contained against external benchmarks for this class of methods.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The paper relies on standard convexity and qualification assumptions common to ADMM convergence theory; no new free parameters beyond the choice of initial step-size threshold are introduced in the abstract, and no new entities are postulated.

free parameters (1)

initial step-size threshold
The first strategy reduces to fixed-step ADMM when the initial step-size is chosen below a certain threshold; this threshold is a tunable parameter whose specific value is not derived from first principles.

axioms (1)

domain assumption Standard assumptions for convergence of proximal ADMM-type methods (convexity of objective, linear constraints, existence of solutions)
Invoked to establish global convergence and sublinear rates; location referenced in abstract as 'under standard assumptions'.

pith-pipeline@v0.9.0 · 5676 in / 1408 out tokens · 26816 ms · 2026-05-18T09:12:05.526680+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

ϕ := 1 + √5 / 2 … ψ ∈ (1,ϕ] … uk := (ψ−1)/ψ x^{k−1} + 1/ψ u^{k−1}
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

τ_k = min{τ_{k−1}, μ √λ_min(S) √β / ||A(x^k−x^{k−1})|| … } and the increasing rule with ρ ∈ (1,1/ψ+1/ψ²]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , 3(1):1–122, 2011

work page 2011
[2]

Chambolle and T

A. Chambolle and T. Pock. A ﬁrst-order primal-dual algor ithm for convex problems with applications to imaging. Journal of mathematical imaging and vision , 40(1):120–145, 2011

work page 2011
[3]

Chang and J

X. Chang and J. Yang. A golden ratio primal–dual algorith m for structured convex opti- mization. Journal of Scientiﬁc Computing , 87(2):47, 2021

work page 2021
[4]

L. Chapel. Introduction to unbalanced optimal transpor t and its eﬃcient computational solu- tions. https://kantorovich.org/event/ki-seminar-chapel/Cha pel-slides.pdf, 2024. Slides; accessed 2025-09-14

work page 2024
[5]

C. Chen, R. H. Chan, S. Ma, and J. Yang. Inertial proximal a dmm for linearly constrained separable convex optimization. SIAM Journal on Imaging Sciences , 8(4):2239–2267, 2015

work page 2015
[6]

H. Chen, G. Gu, and J. Yang. A golden ratio proximal altern ating direction method of multipliers for separable convex optimization. Journal of Global Optimization , 87:581–602, 2023

work page 2023
[7]

Chizat, G

L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard. Unb alanced optimal transport: Dy- namic and kantorovich formulations. Journal of Functional Analysis , 274(11):3090–3123, 2018. 23

work page 2018
[8]

Eckstein

J. Eckstein. Some saddle-function splitting methods fo r convex programming. Optimization Methods and Software , 4(1):75–83, 1994

work page 1994
[9]

Eckstein and D

J. Eckstein and D. P. Bertsekas. On the douglas–rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55:293–318, 1992

work page 1992
[10]

Gabay and B

D. Gabay and B. Mercier. A dual algorithm for the solutio n of nonlinear variational problems via ﬁnite element approximation. Computers & Mathematics with Applications , 2:17–40, 1976

work page 1976
[11]

Glowinski and A

R. Glowinski and A. Marroco. Sur l’approximation à l’ai de de méthodes de pénalisation- dualité. M2AN, 9:41–76, 1975

work page 1975
[12]

Goldstein and S

T. Goldstein and S. Osher. The split bregman method for l 1-regularized problems. SIAM Journal on Imaging Sciences , 2(2):323–343, 2009

work page 2009
[13]

He, L.-Z

B. He, L.-Z. Liao, D. Han, and H. Yang. A new inexact alter nating directions method for monotone variational inequalities. Mathematical Programming, 92(1):103–118, 2002

work page 2002
[14]

M. R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4(5):303–320, 1969

work page 1969
[15]

Padcharoen, P

A. Padcharoen, P. Kumam, and J. Martínez-Moreno. Augme nted lagrangian method for tv-l1-l2 based colour image restoration. Journal of Computational and Applied Mathematics , 354:507–519, 2019

work page 2019
[16]

Parikh and S

N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization , 1(3):127–239, 2014

work page 2014
[17]

Peyré and M

G. Peyré and M. Cuturi. Computational Optimal Transport , volume 11. Foundations and Trends in Machine Learning, 2019

work page 2019
[18]

M. J. D. Powell. A method for nonlinear constraints in mi nimization problems. Academic Press, pages 283–298, 1969

work page 1969
[19]

Rockafellar

R. Rockafellar. Convex analysis, princeton: Princeto n u, 1970

work page 1970
[20]

L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total var iation based noise removal algo- rithms. Physica D: nonlinear phenomena , 60(1-4):259–268, 1992

work page 1992
[21]

Sheﬁ and M

R. Sheﬁ and M. Teboulle. Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization . SIAM Journal on Optimization , 24(1):269–297, 2014

work page 2014
[22]

Tao and X

M. Tao and X. Yuan. Recovering low-rank and sparse compo nents of matrices from incom- plete and noisy observations. SIAM Journal on Optimization , 21(1):57–81, 2011

work page 2011
[23]

Tibshirani

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statis- tical Society Series B: Statistical Methodology , 58(1):267–288, 1996

work page 1996
[24]

C. Villani. Topics in Optimal Transportation . American Mathematical Society, 2003. 24

work page 2003
[25]

Wang and X

X. Wang and X. Yuan. The linearized alternating directi on method of multipliers for dantzig selector. SIAM Journal on Scientiﬁc Computing , 34(5):A2792–A2811, 2012

work page 2012
[26]

Yang and X

J. Yang and X. Yuan. Linearized augmented lagrangian an d alternating direction methods for nuclear norm minimization. Mathematics of computation , 82(281):301–329, 2013

work page 2013
[27]

Yin and J

C. Yin and J. Yang. Golden ratio proximal gradient admm f or distributed composite convex optimization. Journal of Optimization Theory and Applications , 200(3):895–922, 2024

work page 2024
[28]

X. Yuan. Alternating direction method for covariance s election models. Journal of Scientiﬁc Computing, 51(2):261–273, 2012. 25

work page 2012

[1] [1]

S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , 3(1):1–122, 2011

work page 2011

[2] [2]

Chambolle and T

A. Chambolle and T. Pock. A ﬁrst-order primal-dual algor ithm for convex problems with applications to imaging. Journal of mathematical imaging and vision , 40(1):120–145, 2011

work page 2011

[3] [3]

Chang and J

X. Chang and J. Yang. A golden ratio primal–dual algorith m for structured convex opti- mization. Journal of Scientiﬁc Computing , 87(2):47, 2021

work page 2021

[4] [4]

L. Chapel. Introduction to unbalanced optimal transpor t and its eﬃcient computational solu- tions. https://kantorovich.org/event/ki-seminar-chapel/Cha pel-slides.pdf, 2024. Slides; accessed 2025-09-14

work page 2024

[5] [5]

C. Chen, R. H. Chan, S. Ma, and J. Yang. Inertial proximal a dmm for linearly constrained separable convex optimization. SIAM Journal on Imaging Sciences , 8(4):2239–2267, 2015

work page 2015

[6] [6]

H. Chen, G. Gu, and J. Yang. A golden ratio proximal altern ating direction method of multipliers for separable convex optimization. Journal of Global Optimization , 87:581–602, 2023

work page 2023

[7] [7]

Chizat, G

L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard. Unb alanced optimal transport: Dy- namic and kantorovich formulations. Journal of Functional Analysis , 274(11):3090–3123, 2018. 23

work page 2018

[8] [8]

Eckstein

J. Eckstein. Some saddle-function splitting methods fo r convex programming. Optimization Methods and Software , 4(1):75–83, 1994

work page 1994

[9] [9]

Eckstein and D

J. Eckstein and D. P. Bertsekas. On the douglas–rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55:293–318, 1992

work page 1992

[10] [10]

Gabay and B

D. Gabay and B. Mercier. A dual algorithm for the solutio n of nonlinear variational problems via ﬁnite element approximation. Computers & Mathematics with Applications , 2:17–40, 1976

work page 1976

[11] [11]

Glowinski and A

R. Glowinski and A. Marroco. Sur l’approximation à l’ai de de méthodes de pénalisation- dualité. M2AN, 9:41–76, 1975

work page 1975

[12] [12]

Goldstein and S

T. Goldstein and S. Osher. The split bregman method for l 1-regularized problems. SIAM Journal on Imaging Sciences , 2(2):323–343, 2009

work page 2009

[13] [13]

He, L.-Z

B. He, L.-Z. Liao, D. Han, and H. Yang. A new inexact alter nating directions method for monotone variational inequalities. Mathematical Programming, 92(1):103–118, 2002

work page 2002

[14] [14]

M. R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4(5):303–320, 1969

work page 1969

[15] [15]

Padcharoen, P

A. Padcharoen, P. Kumam, and J. Martínez-Moreno. Augme nted lagrangian method for tv-l1-l2 based colour image restoration. Journal of Computational and Applied Mathematics , 354:507–519, 2019

work page 2019

[16] [16]

Parikh and S

N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization , 1(3):127–239, 2014

work page 2014

[17] [17]

Peyré and M

G. Peyré and M. Cuturi. Computational Optimal Transport , volume 11. Foundations and Trends in Machine Learning, 2019

work page 2019

[18] [18]

M. J. D. Powell. A method for nonlinear constraints in mi nimization problems. Academic Press, pages 283–298, 1969

work page 1969

[19] [19]

Rockafellar

R. Rockafellar. Convex analysis, princeton: Princeto n u, 1970

work page 1970

[20] [20]

L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total var iation based noise removal algo- rithms. Physica D: nonlinear phenomena , 60(1-4):259–268, 1992

work page 1992

[21] [21]

Sheﬁ and M

R. Sheﬁ and M. Teboulle. Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization . SIAM Journal on Optimization , 24(1):269–297, 2014

work page 2014

[22] [22]

Tao and X

M. Tao and X. Yuan. Recovering low-rank and sparse compo nents of matrices from incom- plete and noisy observations. SIAM Journal on Optimization , 21(1):57–81, 2011

work page 2011

[23] [23]

Tibshirani

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statis- tical Society Series B: Statistical Methodology , 58(1):267–288, 1996

work page 1996

[24] [24]

C. Villani. Topics in Optimal Transportation . American Mathematical Society, 2003. 24

work page 2003

[25] [25]

Wang and X

X. Wang and X. Yuan. The linearized alternating directi on method of multipliers for dantzig selector. SIAM Journal on Scientiﬁc Computing , 34(5):A2792–A2811, 2012

work page 2012

[26] [26]

Yang and X

J. Yang and X. Yuan. Linearized augmented lagrangian an d alternating direction methods for nuclear norm minimization. Mathematics of computation , 82(281):301–329, 2013

work page 2013

[27] [27]

Yin and J

C. Yin and J. Yang. Golden ratio proximal gradient admm f or distributed composite convex optimization. Journal of Optimization Theory and Applications , 200(3):895–922, 2024

work page 2024

[28] [28]

X. Yuan. Alternating direction method for covariance s election models. Journal of Scientiﬁc Computing, 51(2):261–273, 2012. 25

work page 2012