bAdag: an adaptive block coordinate gradient method for smooth nonconvex functions

Giovanni Seraghiti

arxiv: 2606.11791 · v1 · pith:AHTAYTDVnew · submitted 2026-06-10 · 🧮 math.OC

bAdag: an adaptive block coordinate gradient method for smooth nonconvex functions

Giovanni Seraghiti This is my paper

Pith reviewed 2026-06-27 08:58 UTC · model grok-4.3

classification 🧮 math.OC

keywords block coordinate gradientadaptive methodsnonconvex optimizationAdaGradconvergence ratesGauss-SouthwellOFFO methods

0 comments

The pith

bAdag adapts step sizes from cumulative block gradients to prove sublinear ergodic convergence for smooth nonconvex minimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces bAdag, a block coordinate gradient method modeled on AdaGrad that sets each step size from the running sum of squared block gradients instead of full gradients. It proves that the ergodic average of squared gradient norms reaches zero at a sublinear rate for smooth nonconvex objectives. The same rates hold under three block selection rules: cyclic, uniform random, and Gauss-Southwell. The method requires no objective evaluations and extends to box-constrained problems. This supplies convergence guarantees for adaptive block methods on large-scale nonconvex tasks where full gradients are expensive.

Core claim

bAdag computes an adaptive step size from the cumulative sum of squared block gradients at each iteration and establishes that the ergodic average of the squared gradient norms converges to zero at a sublinear rate when the objective is smooth and possibly nonconvex, provided the gradient satisfies block Lipschitz continuity; the result holds for cyclic, uniform random, and Gauss-Southwell selection and carries over to box-constrained problems.

What carries the argument

Adaptive step size computed from the cumulative sum of squared norms of the selected block gradients, replacing the full-gradient accumulation of standard AdaGrad.

If this is right

Ergodic sublinear rates hold uniformly for cyclic, uniform random, and Gauss-Southwell block selection.
The same rates and analysis extend to box-constrained smooth problems.
The algorithm never evaluates the objective function value.
Numerical experiments on synthetic and real-world instances confirm the predicted behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The uniform rate across selection rules suggests that practical speed differences among the three rules arise mainly from implementation cost rather than asymptotic guarantees.
The OFFO property opens the door to applying the same cumulative-block mechanism inside other first-order frameworks that avoid function evaluations.
If the block Lipschitz assumption can be replaced by a weaker local condition, the method could apply to a broader class of nonconvex problems without changing the algorithm.

Load-bearing premise

The gradient satisfies block Lipschitz continuity.

What would settle it

A smooth nonconvex function whose gradient meets the block Lipschitz condition, yet the ergodic average of squared gradient norms fails to approach zero when bAdag is run with any of the three selection rules.

Figures

Figures reproduced from arXiv: 2606.11791 by Giovanni Seraghiti.

**Figure 2.** Figure 2: Huber-NMF objective function values along the iterations [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗

**Figure 3.** Figure 3: Visual reconstruction of 64 randomly selected images fro [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

read the original abstract

A new Block Coordinate Gradient (BCG) method, dubbed bAdag, for smooth, nonconvex minimization problem is proposed; it falls in the class of Objective Function Free Optimization (OFFO) methods, and it is based on the AdaGrad algorithm. At each iteration, our method computes an adaptive step size based on the cumulative sum of block gradients, instead of full gradients as in AdaGrad-type methods. We prove ergodic, sublinear convergence rates for the bAdag algorithm when minimizing a smooth, possibly nonconvex objective under the (block) Lipschitz continuity assumption on the gradient. Our theory covers three widely popular block selection strategies: the Cyclic (C) rule, Uniform Random selection (UR), and the greedy Gauss-Southwell (GS) rule. We also extend our algorithm and its convergence theory to box-constrained smooth functions. We validate the proposed algorithms through synthetic and real-world experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

bAdag gives a block-wise AdaGrad step size for nonconvex BCG with ergodic sublinear rates under block Lipschitz gradients, covering cyclic, random, and Gauss-Southwell rules plus a box-constraint extension.

read the letter

The paper introduces bAdag, an adaptive block coordinate gradient method that sets steps from the cumulative sum of block gradients rather than full gradients. It supplies ergodic sublinear convergence for smooth nonconvex objectives under the standard block Lipschitz assumption, and the theory is written for cyclic, uniform random, and Gauss-Southwell selection. A box-constrained version is included with matching guarantees.

The work is useful because it collects three common selection rules under one adaptive framework and carries the analysis through to the constrained case without adding hidden requirements. The assumption is stated explicitly and the rates follow from it in the usual way, so the argument is not circular.

The rates themselves are the ordinary sublinear ones for nonconvex coordinate methods; nothing faster or stronger appears. Experiments are mentioned on synthetic and real data, but the abstract gives no quantitative comparison to existing adaptive BCG schemes, which leaves the practical edge unclear.

This is a solid incremental piece for researchers who already work on block coordinate or adaptive first-order methods. A reader looking for a reference that handles multiple selection rules and the box case in one place will find it straightforward to use.

The manuscript is coherent on its own terms and cites the relevant literature without distortion. It should go to peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes bAdag, an adaptive block coordinate gradient method based on AdaGrad for minimizing smooth nonconvex functions. It proves ergodic sublinear convergence rates under the standard block Lipschitz continuity assumption on the gradient, covering the cyclic, uniform random, and Gauss-Southwell block selection rules. The theory and algorithm are extended to box-constrained problems, with validation on synthetic and real-world experiments.

Significance. If the stated rates hold, the work supplies convergence guarantees for an adaptive BCG method in the nonconvex regime under the block-Lipschitz assumption, extending to three common selection strategies and box constraints. This is a useful addition to the OFFO literature, where adaptive step-size constructions with block-coordinate analysis remain relatively sparse.

minor comments (3)

[§2] §2, definition of the cumulative block-gradient accumulator: the notation G_k^{(i)} is introduced without an explicit recursive formula; adding the update equation would improve readability.
[Theorem 3.1] Theorem 3.1 (ergodic rate): the dependence of the constant on the block-Lipschitz parameter L_i is stated but the explicit scaling with the number of blocks is not highlighted; a remark clarifying this would help.
[§5] Numerical section: the comparison baselines do not include a non-adaptive BCG method with the same block selection rules; adding one would strengthen the empirical claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript on bAdag and the recommendation for minor revision. The referee's summary correctly reflects the paper's contributions regarding the adaptive block coordinate method, convergence theory under block Lipschitz assumptions, coverage of multiple selection rules, extension to box constraints, and experimental validation.

Circularity Check

0 steps flagged

No significant circularity; rates derived from external Lipschitz assumption

full rationale

The paper states an explicit external assumption (block Lipschitz continuity of the gradient) and derives ergodic sublinear rates for bAdag under that assumption for standard selection rules (C/UR/GS) and a box-constraint extension. No step reduces a claimed result to a quantity defined from the algorithm outputs themselves, no fitted parameter is relabeled as a prediction, and no load-bearing self-citation chain is present in the abstract or description. The derivation chain is therefore self-contained against the stated assumption.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of block Lipschitz continuity of the gradient; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The gradient of the objective is (block) Lipschitz continuous
Invoked explicitly in the abstract as the condition under which the convergence rates hold.

pith-pipeline@v0.9.1-grok · 5680 in / 1151 out tokens · 27117 ms · 2026-06-27T08:58:58.850076+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 1 canonical work pages

[1]

Musco, C

Mushroom. UCI Machine Learning Repository, 1981. DOI: h ttps://doi.org/10.24432/C5959T. [2] . UCI machine learning repository, 2013

work page doi:10.24432/c5959t 1981
[2]

An accelerated coordinate g radient descent algorithm for non-separable composite optimization

Aviad Aberdam and Amir Beck. An accelerated coordinate g radient descent algorithm for non-separable composite optimization. Journal of Optimization Theory and Applications , 193(1):219–246, 2022

2022
[3]

Even faster accelerated coordinate descent using non-uniform sampling

Zeyuan Allen-Zhu, Zheng Qu, Peter Richt´ arik, and Yang Y uan. Even faster accelerated coordinate descent using non-uniform sampling. In International Conference on Machine Learning , pages 1110–1119. PMLR, 2016

2016
[4]

Accelerating n onnegative matrix factorization algorithms using extrapolation

Andersen Man Shun Ang and Nicolas Gillis. Accelerating n onnegative matrix factorization algorithms using extrapolation. Neural Computation , 31(2):417–439, 2019

2019
[5]

Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-/suppress lojasiewicz inequality

H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-/suppress lojasiewicz inequality. Mathematics of Operations Research , 35(2):438–457, 2010

2010
[6]

Con vergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods

Hedy Attouch, J´ erˆ ome Bolte, and Benar Fux Svaiter. Con vergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical programming, 137(1):91–129, 2013

2013
[7]

Optimisation

Alfred Auslender. Optimisation. M´ ethodes num´ eriques, 1976

1976
[8]

Instance-wise distributionally robust nonnegative matrix factorization

W afa Barkhoda, Amjad Seyedi, Nicolas Gillis, and Fardin Akhlaghian Tab. Instance-wise distributionally robust nonnegative matrix factorization. Pattern Recognition, 169:111732, 2026

2026
[9]

On the convergence of alternating minimizat ion for convex programming with applications to iteratively reweighted least squares and decomposition sc hemes

Amir Beck. On the convergence of alternating minimizat ion for convex programming with applications to iteratively reweighted least squares and decomposition sc hemes. SIAM Journal on Optimization , 25(1):185– 209, 2015. 25

2015
[10]

On the convergence of b lock coordinate descent type methods

Amir Beck and Luba Tetruashvili. On the convergence of b lock coordinate descent type methods. SIAM journal on Optimization , 23(4):2037–2060, 2013

2037
[11]

Fast stochastic second-order adagrad for nonconvex bound-constrained optimization

Stefania Bellavia, Serge Gratton, Benedetta Morini, a nd Ph L Toint. Fast stochastic second-order adagrad for nonconvex bound-constrained optimization. arXiv preprint arXiv:2505.06374 , 2025

Pith/arXiv arXiv 2025
[12]

Nonlinear programming

Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society , 48(3):334–334, 1997

1997
[13]

Hyperspectral unmixing overview: Geometri cal, statistical, and sparse regression-based ap- proaches

Jos´ e M Bioucas-Dias, Antonio Plaza, Nicolas Dobigeon , Mario Parente, Qian Du, Paul Gader, and Joce- lyn Chanussot. Hyperspectral unmixing overview: Geometri cal, statistical, and sparse regression-based ap- proaches. IEEE Journal of Selected Topics in Applied Earth Observatio ns and Remote Sensing , 5(2):354–379, 2012

2012
[14]

Proximal alternating linearized minimization for nonconvex and nonsmooth problems

J´ erˆ ome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1):459–494, 2014

2014
[15]

A cyclic block coordinate descent method with gener- alized gradient projections

Silvia Bonettini, Marco Prato, and Simone Rebegoldi. A cyclic block coordinate descent method with gener- alized gradient projections. Applied Mathematics and Computation , 286:288–300, 2016

2016
[16]

Cyclic block coordinate descent with variance reduction for composite nonconvex optimization

Xufeng Cai, Chaobing Song, Stephen W right, and Jelena D iakonikolas. Cyclic block coordinate descent with variance reduction for composite nonconvex optimization. In International Conference on Machine Learning , pages 3469–3494. PMLR, 2023

2023
[17]

E nhancing sparsity by reweighted ℓ1 minimiza- tion

Emmanuel J Candes, Michael B W akin, and Stephen P Boyd. E nhancing sparsity by reweighted ℓ1 minimiza- tion. Journal of Fourier Analysis and Applications , 14(5):877–905, 2008

2008
[18]

LIBSVM: A library fo r support vector machines

Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library fo r support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) , 2(3):1–27, 2011

2011
[19]

Trust region methods

Andrew R Conn, Nicholas IM Gould, and Philippe L Toint. Trust region methods . SIAM, 2000

2000
[20]

Global convergenc e of arbitrary-block gradient methods for generalized Polyak- Lojasiewicz functions

Dominik Csiba and Peter Richt´ arik. Global convergenc e of arbitrary-block gradient methods for generalized Polyak- Lojasiewicz functions. arXiv preprint arXiv:1709.03014 , 2017

Pith/arXiv arXiv 2017
[21]

A simple convergence proof of Adam and Adagrad

Alexandre D´ efossez, Leon Bottou, Francis Bach, and Ni colas Usunier. A simple convergence proof of Adam and Adagrad. Transactions on Machine Learning Research , 2022

2022
[22]

Nearest neighbor based greedy coordinate descent

Inderjit Dhillon, Pradeep Ravikumar, and Ambuj Tewari . Nearest neighbor based greedy coordinate descent. Advances in Neural Information Processing Systems , 24, 2011

2011
[23]

Alternatin g randomized block coordinate descent

Jelena Diakonikolas and Lorenzo Orecchia. Alternatin g randomized block coordinate descent. In International Conference on Machine Learning , pages 1224–1232. PMLR, 2018

2018
[24]

On the equivalence betw een non-negative matrix factorization and proba- bilistic latent semantic indexing

Chris Ding, Tao Li, and W ei Peng. On the equivalence betw een non-negative matrix factorization and proba- bilistic latent semantic indexing. Computational Statistics & Data Analysis , 52(8):3913–3927, 2008

2008
[25]

Benchmarking optim ization software with performance proﬁles

Elizabeth D Dolan and Jorge J Mor´ e. Benchmarking optim ization software with performance proﬁles. Math- ematical Programming, 91(2):201–213, 2002

2002
[26]

Robust nonnegative matrix factorization via half-quadratic minimiza- tion

Liang Du, Xuan Li, and Yi-Dong Shen. Robust nonnegative matrix factorization via half-quadratic minimiza- tion. In 2012 IEEE 12th International Conference on Data Mining , pages 201–210. IEEE, 2012

2012
[27]

Adaptive subg radient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subg radient methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12(7), 2011

2011
[28]

Lever aging two reference functions in block Bregman proximal gradient descent for non-convex and non-lipschit z problems

Tianxiang Gao, Songtao Lu, Jia Liu, and Chris Chu. Lever aging two reference functions in block Bregman proximal gradient descent for non-convex and non-lipschit z problems. arXiv preprint arXiv:1912.07527 , 2019

arXiv 1912
[29]

On the convergence of Randomized Bregman Coordinate Descent for Non-Lipschitz Composite Problems

Tianxiang Gao, Songtao Lu, Jia Liu, and Chris Chu. On the convergence of Randomized Bregman Coordinate Descent for Non-Lipschitz Composite Problems. In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 5549–5553, 2021

2021
[30]

Nonnegative Matrix Factorization

Nicolas Gillis. Nonnegative Matrix Factorization . SIAM, Philadelphia, 2020

2020
[31]

Complexity o f a class of ﬁrst-order objective-function-free optimization algorithms

Serge Gratton, Sadok Jerad, and Ph L Toint. Complexity o f a class of ﬁrst-order objective-function-free optimization algorithms. Optimization Methods and Software , pages 1–31, 2024

2024
[32]

Complexity a nd performance for two classes of noise-tolerant ﬁrst-order algorithms

Serge Gratton, Sadok Jerad, and Ph L Toint. Complexity a nd performance for two classes of noise-tolerant ﬁrst-order algorithms. Optimization Methods and Software , pages 1–27, 2025

2025
[33]

Param etric complexity analysis for a class of ﬁrst-order adagrad-like algorithms

Serge Gratton, Sadok Jerad, and Philippe L Toint. Param etric complexity analysis for a class of ﬁrst-order adagrad-like algorithms. arXiv preprint arXiv:2203.01647 , 2022

arXiv 2022
[34]

Compl exity of adagrad and other ﬁrst-order methods for nonconvex optimization problems with bounds constraints

Serge Gratton, Sadok Jerad, and Philippe L Toint. Compl exity of adagrad and other ﬁrst-order methods for nonconvex optimization problems with bounds constraints. arXiv preprint arXiv:2406.15793 , 2024

arXiv 2024
[35]

S2MPJ and CUTEst optimizat ion problems for Matlab, Python and Julia

Serge Gratton and Ph L Toint. S2MPJ and CUTEst optimizat ion problems for Matlab, Python and Julia. Optimization Methods and Software , 40(4):871–903, 2025

2025
[36]

A uniﬁed convergence theor y for adaptive ﬁrst-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampo o and muon

Serge Gratton and Ph L Toint. A uniﬁed convergence theor y for adaptive ﬁrst-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampo o and muon. arXiv preprint arXiv:2604.17423 , 2026. 26

Pith/arXiv arXiv 2026
[37]

Stochastic converg ence of parallel asynchronous adaptive ﬁrst-order methods

Serge Gratton and Philippe L Toint. Stochastic converg ence of parallel asynchronous adaptive ﬁrst-order methods. arXiv preprint arXiv:2606.01787 , 2026

Pith/arXiv arXiv 2026
[38]

On the convergence o f the block nonlinear Gauss-Seidel method under convex constraints

Luigi Grippo and Marco Sciandrone. On the convergence o f the block nonlinear Gauss-Seidel method under convex constraints. Operations Research Letters, 26(3):127–136, 2000

2000
[39]

MahNMF: Manhattan non-negative matrix factorization

Naiyang Guan, Dacheng Tao, Zhigang Luo, and John Shawe- Taylor. MahNMF: Manhattan non-negative matrix factorization. arXiv preprint arXiv:1207.3438 , 2012

Pith/arXiv arXiv 2012
[40]

Non-negative matrix factorization for face recognition

David Guillamet and Jordi Vitria. Non-negative matrix factorization for face recognition. In Catalonian Conference on Artiﬁcial Intelligence , pages 336–344. Springer, 2002

2002
[41]

A modiﬁed Huber nonnegative matrix factorization algorithm for hyperspectral unmixing

Ziyang Guo, Anyou Min, Bing Yang, Junhong Chen, and Hong Li. A modiﬁed Huber nonnegative matrix factorization algorithm for hyperspectral unmixing. IEEE Journal of Selected Topics in Applied Earth Obser- vations and Remote Sensing , 14:5559–5571, 2021

2021
[42]

Shampoo: P reconditioned stochastic tensor optimization

Vineet Gupta, Tomer Koren, and Yoram Singer. Shampoo: P reconditioned stochastic tensor optimization. In International Conference on Machine Learning , pages 1842–1850. PMLR, 2018

2018
[43]

Fastest rates for s tochastic mirror descent methods

Filip Hanzely and Peter Richt´ arik. Fastest rates for s tochastic mirror descent methods. Computational Opti- mization and Applications , 79(3):717–766, 2021

2021
[44]

Accelera ted Bregman proximal gradient methods for relatively smooth convex optimization

Filip Hanzely, Peter Richtarik, and Lin Xiao. Accelera ted Bregman proximal gradient methods for relatively smooth convex optimization. Computational Optimization and Applications , 79(2):405–440, 2021

2021
[45]

Block majorization minimization with extrapolation and application to NMF

Le Thi Khanh Hien, Valentin Leplat, and Nicolas Gillis. Block majorization minimization with extrapolation and application to NMF. SIAM Journal on Mathematics of Data Science , 7(3):1292–1314, 2025

2025
[46]

Iteration complexity analysis of block coordinate descent methods

Mingyi Hong, Xiangfeng W ang, Meisam Razaviyayn, and Zh i-Quan Luo. Iteration complexity analysis of block coordinate descent methods. Mathematical Programming, 163(1):85–114, 2017

2017
[47]

Muon: An optimizer for hidden layers in neural networks, 202 4

Keller Jordan, Yuchen Jin, Vlado Boza, Jiacheng You, Fr anz Cesista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks, 202 4
[48]

Robust ℓ1-norm factorization in the presence of outliers and missing data by alternative convex programming

Qifa Ke and Takeo Kanade. Robust ℓ1-norm factorization in the presence of outliers and missing data by alternative convex programming. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 739–746, 2005

2005
[49]

Adam: a method for stochastic optimi zation

Diederik P Kingma. Adam: a method for stochastic optimi zation. arXiv preprint arXiv:1412.6980 , 2014

Pith/arXiv arXiv 2014
[50]

Robust nonneg ative matrix factorization using l21-norm

Deguang Kong, Chris Ding, and Heng Huang. Robust nonneg ative matrix factorization using l21-norm. In Proceedings of the 20th ACM International Conference on Inf ormation and Knowledge Management , pages 673–682, 2011

2011
[51]

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problem s

Puya Latafat, Andreas Themelis, and Panagiotis Patrin os. Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problem s. Mathematical Programming, 193(1):195–224, 2022

2022
[52]

Gradient-based learning applied to document recognition

Yann LeCun, L´ eon Bottou, Yoshua Bengio, and Patrick Ha ﬀner. Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11):2278–2324, 1998

1998
[53]

Random permutation s ﬁx a worst case for cyclic coordinate descent

Ching-Pei Lee and Stephen J W right. Random permutation s ﬁx a worst case for cyclic coordinate descent. IMA Journal of Numerical Analysis , 39(3):1246–1275, 2019

2019
[54]

Learning the parts of objects by non-negative matrix factorization

Daniel D Lee and H Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. nature, 401(6755):788–791, 1999

1999
[55]

Eﬃcient accelerated coor dinate descent methods and faster algorithms for solving linear systems

Yin Tat Lee and Aaron Sidford. Eﬃcient accelerated coor dinate descent methods and faster algorithms for solving linear systems. In 2013 IEEE 54th Annual Symposium on Foundations of Computer S cience, pages 147–156. IEEE, 2013

2013
[56]

Coordinate-w ise power method

Qi Lei, Kai Zhong, and Inderjit S Dhillon. Coordinate-w ise power method. Advances in neural information processing systems, 29, 2016

2016
[57]

On faster convergence of cyclic block coordinate descent-type methods for strongly convex minim ization

Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, and Mingyi Ho ng. On faster convergence of cyclic block coordinate descent-type methods for strongly convex minim ization. Journal of Machine Learning Research , 18(184):1–24, 2018

2018
[58]

An improved Adam optimization algorithm combining adaptive coeﬃcients and composite gra dients based on randomized block coordinate descent

Miaomiao Liu, Dan Yao, Zhigang Liu, Jingfeng Guo, and Ji ng Chen. An improved Adam optimization algorithm combining adaptive coeﬃcients and composite gra dients based on randomized block coordinate descent. Computational Intelligence and Neuroscience , 2023(1):4765891, 2023

2023
[59]

Randomized block coordinate n on-monotone gradient method for a class of nonlinear programming

Zhaosong Lu and Lin Xiao. Randomized block coordinate n on-monotone gradient method for a class of nonlinear programming. arXiv preprint arXiv:1306.5918 , 1273, 2013

Pith/arXiv arXiv 2013
[60]

On the complexity analysis of r andomized block-coordinate descent methods

Zhaosong Lu and Lin Xiao. On the complexity analysis of r andomized block-coordinate descent methods. Mathematical Programming, 152(1):615–642, 2015

2015
[61]

On the convergence of the co ordinate descent method for convex diﬀerentiable minimization

Zhi-Quan Luo and Paul Tseng. On the convergence of the co ordinate descent method for convex diﬀerentiable minimization. Journal of Optimization Theory and Applications , 72(1):7–35, 1992

1992
[62]

A signal processi ng perspective on hyperspectral unmixing: Insights from remote sensing

Wing-Kin Ma, Jos´ e M Bioucas-Dias, Tsung-Han Chan, Nic olas Gillis, Paul Gader, Antonio J Plaza, Arul- Murugan Ambikapathi, and Chong-Yung Chi. A signal processi ng perspective on hyperspectral unmixing: Insights from remote sensing. IEEE Signal Processing Magazine , 31(1):67–81, 2013. 27

2013
[63]

Adaptive bound optimization for online convex optimization

H Brendan McMahan and Matthew Streeter. Adaptive bound optimization for online convex optimization. arXiv preprint arXiv:1002.4908 , 2010

Pith/arXiv arXiv 2010
[64]

Eﬃciency of stochast ic coordinate proximal gradient methods on nonsep- arable composite optimization

Ion Necoara and Flavia Chorobura. Eﬃciency of stochast ic coordinate proximal gradient methods on nonsep- arable composite optimization. Mathematics of Operations Research , 50(2):993–1018, 2025

2025
[65]

Parallel random coordi nate descent method for composite minimization: Convergence analysis and error bounds

Ion Necoara and Dragos Clipici. Parallel random coordi nate descent method for composite minimization: Convergence analysis and error bounds. SIAM Journal on Optimization , 26(1):197–226, 2016

2016
[66]

Eﬃciency of coordinate descent methods on huge-scale optimization problems

Yu Nesterov. Eﬃciency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization , 22(2):341–362, 2012

2012
[67]

Eﬃciency of the ac celerated coordinate descent method on structured optimization problems

Yurii Nesterov and Sebastian U Stich. Eﬃciency of the ac celerated coordinate descent method on structured optimization problems. SIAM Journal on Optimization , 27(1):110–123, 2017

2017
[68]

Let’s ma ke block coordinate descent converge faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, an d Superlinear Convergence

Julie Nutini, Issam Laradji, and Mark Schmidt. Let’s ma ke block coordinate descent converge faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, an d Superlinear Convergence. Journal of Machine Learning Research, 23(131):1–74, 2022

2022
[69]

Coordinate descent converges faster with the Gauss-Southwell rule than random selection

Julie Nutini, Mark Schmidt, Issam Laradji, Michael Fri edlander, and Hoyt Koepke. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In International Conference on Machine Learning, pages 1632–1641. PMLR, 2015

2015
[70]

Eﬃcient random coordi nate descent algorithms for large-scale structured nonconvex optimization

Andrei Patrascu and Ion Necoara. Eﬃcient random coordi nate descent algorithms for large-scale structured nonconvex optimization. Journal of Global Optimization , 61(1):19–46, 2015

2015
[71]

Coordinate friendly structures, algorithms and applications

Zhimin Peng, Tianyu W u, Yangyang Xu, Ming Yan, and W otao Yin. Coordinate friendly structures, algorithms and applications. arXiv preprint arXiv:1601.00863 , 2016

Pith/arXiv arXiv 2016
[72]

prunadag: an adaptive pruning-aware gradient method

Margherita Porcelli, Giovanni Seraghiti, and Philipp e L Toint. prunadag: an adaptive pruning-aware gradient method. Computational Optimization and Applications , 2026

2026
[73]

A uni ﬁed convergence analysis of block successive minimization methods for nonsmooth optimization

Meisam Razaviyayn, Mingyi Hong, and Zhi-Quan Luo. A uni ﬁed convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization , 23(2):1126–1153, 2013

2013
[74]

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

Peter Richt´ arik and Martin Tak´ aˇ c. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, 144(1):1–38, 2014

2014
[75]

On the ﬁnite time convergen ce of cyclic coordinate descent methods

Ankan Saha and Ambuj Tewari. On the ﬁnite time convergen ce of cyclic coordinate descent methods. arXiv preprint arXiv:1005.2146 , 2010

Pith/arXiv arXiv 2010
[76]

Nonnegative matrix factorization in the component-wise l1 norm for sparse data

Giovanni Seraghiti, K´ evin Dubrulle, Arnaud Vandaele , and Nicolas Gillis. Nonnegative matrix factorization in the component-wise l1 norm for sparse data. arXiv preprint arXiv:2603.29715 , 2026

arXiv 2026
[77]

Document clustering using non- negative matrix factorization

Farial Shahnaz, Michael W Berry, V Paul Pauca, and Rober t J Plemmons. Document clustering using non- negative matrix factorization. Information Processing & Management , 42(2):373–386, 2006

2006
[78]

Stochastic meth ods for ℓ1 regularized loss minimization

Shai Shalev-Shwartz and Ambuj Tewari. Stochastic meth ods for ℓ1 regularized loss minimization. In Proceed- ings of the 26th Annual International Conference on Machine Learning, pages 929–936, 2009

2009
[79]

A primer on coordinate descent algo- rithms

Hao-Jun Michael Shi, Shenyinying Tu, Yangyang Xu, and W otao Yin. A primer on coordinate descent algo- rithms. arXiv preprint arXiv:1610.00040 , 2016

Pith/arXiv arXiv 2016
[80]

Cyclic coordin ate dual averaging with extrapolation

Chaobing Song and Jelena Diakonikolas. Cyclic coordin ate dual averaging with extrapolation. SIAM Journal on Optimization , 33(4):2935–2961, 2023

2023

Showing first 80 references.

[1] [1]

Musco, C

Mushroom. UCI Machine Learning Repository, 1981. DOI: h ttps://doi.org/10.24432/C5959T. [2] . UCI machine learning repository, 2013

work page doi:10.24432/c5959t 1981

[2] [2]

An accelerated coordinate g radient descent algorithm for non-separable composite optimization

Aviad Aberdam and Amir Beck. An accelerated coordinate g radient descent algorithm for non-separable composite optimization. Journal of Optimization Theory and Applications , 193(1):219–246, 2022

2022

[3] [3]

Even faster accelerated coordinate descent using non-uniform sampling

Zeyuan Allen-Zhu, Zheng Qu, Peter Richt´ arik, and Yang Y uan. Even faster accelerated coordinate descent using non-uniform sampling. In International Conference on Machine Learning , pages 1110–1119. PMLR, 2016

2016

[4] [4]

Accelerating n onnegative matrix factorization algorithms using extrapolation

Andersen Man Shun Ang and Nicolas Gillis. Accelerating n onnegative matrix factorization algorithms using extrapolation. Neural Computation , 31(2):417–439, 2019

2019

[5] [5]

Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-/suppress lojasiewicz inequality

H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-/suppress lojasiewicz inequality. Mathematics of Operations Research , 35(2):438–457, 2010

2010

[6] [6]

Con vergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods

Hedy Attouch, J´ erˆ ome Bolte, and Benar Fux Svaiter. Con vergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical programming, 137(1):91–129, 2013

2013

[7] [7]

Optimisation

Alfred Auslender. Optimisation. M´ ethodes num´ eriques, 1976

1976

[8] [8]

Instance-wise distributionally robust nonnegative matrix factorization

W afa Barkhoda, Amjad Seyedi, Nicolas Gillis, and Fardin Akhlaghian Tab. Instance-wise distributionally robust nonnegative matrix factorization. Pattern Recognition, 169:111732, 2026

2026

[9] [9]

On the convergence of alternating minimizat ion for convex programming with applications to iteratively reweighted least squares and decomposition sc hemes

Amir Beck. On the convergence of alternating minimizat ion for convex programming with applications to iteratively reweighted least squares and decomposition sc hemes. SIAM Journal on Optimization , 25(1):185– 209, 2015. 25

2015

[10] [10]

On the convergence of b lock coordinate descent type methods

Amir Beck and Luba Tetruashvili. On the convergence of b lock coordinate descent type methods. SIAM journal on Optimization , 23(4):2037–2060, 2013

2037

[11] [11]

Fast stochastic second-order adagrad for nonconvex bound-constrained optimization

Stefania Bellavia, Serge Gratton, Benedetta Morini, a nd Ph L Toint. Fast stochastic second-order adagrad for nonconvex bound-constrained optimization. arXiv preprint arXiv:2505.06374 , 2025

Pith/arXiv arXiv 2025

[12] [12]

Nonlinear programming

Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society , 48(3):334–334, 1997

1997

[13] [13]

Hyperspectral unmixing overview: Geometri cal, statistical, and sparse regression-based ap- proaches

Jos´ e M Bioucas-Dias, Antonio Plaza, Nicolas Dobigeon , Mario Parente, Qian Du, Paul Gader, and Joce- lyn Chanussot. Hyperspectral unmixing overview: Geometri cal, statistical, and sparse regression-based ap- proaches. IEEE Journal of Selected Topics in Applied Earth Observatio ns and Remote Sensing , 5(2):354–379, 2012

2012

[14] [14]

Proximal alternating linearized minimization for nonconvex and nonsmooth problems

J´ erˆ ome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1):459–494, 2014

2014

[15] [15]

A cyclic block coordinate descent method with gener- alized gradient projections

Silvia Bonettini, Marco Prato, and Simone Rebegoldi. A cyclic block coordinate descent method with gener- alized gradient projections. Applied Mathematics and Computation , 286:288–300, 2016

2016

[16] [16]

Cyclic block coordinate descent with variance reduction for composite nonconvex optimization

Xufeng Cai, Chaobing Song, Stephen W right, and Jelena D iakonikolas. Cyclic block coordinate descent with variance reduction for composite nonconvex optimization. In International Conference on Machine Learning , pages 3469–3494. PMLR, 2023

2023

[17] [17]

E nhancing sparsity by reweighted ℓ1 minimiza- tion

Emmanuel J Candes, Michael B W akin, and Stephen P Boyd. E nhancing sparsity by reweighted ℓ1 minimiza- tion. Journal of Fourier Analysis and Applications , 14(5):877–905, 2008

2008

[18] [18]

LIBSVM: A library fo r support vector machines

Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library fo r support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) , 2(3):1–27, 2011

2011

[19] [19]

Trust region methods

Andrew R Conn, Nicholas IM Gould, and Philippe L Toint. Trust region methods . SIAM, 2000

2000

[20] [20]

Global convergenc e of arbitrary-block gradient methods for generalized Polyak- Lojasiewicz functions

Dominik Csiba and Peter Richt´ arik. Global convergenc e of arbitrary-block gradient methods for generalized Polyak- Lojasiewicz functions. arXiv preprint arXiv:1709.03014 , 2017

Pith/arXiv arXiv 2017

[21] [21]

A simple convergence proof of Adam and Adagrad

Alexandre D´ efossez, Leon Bottou, Francis Bach, and Ni colas Usunier. A simple convergence proof of Adam and Adagrad. Transactions on Machine Learning Research , 2022

2022

[22] [22]

Nearest neighbor based greedy coordinate descent

Inderjit Dhillon, Pradeep Ravikumar, and Ambuj Tewari . Nearest neighbor based greedy coordinate descent. Advances in Neural Information Processing Systems , 24, 2011

2011

[23] [23]

Alternatin g randomized block coordinate descent

Jelena Diakonikolas and Lorenzo Orecchia. Alternatin g randomized block coordinate descent. In International Conference on Machine Learning , pages 1224–1232. PMLR, 2018

2018

[24] [24]

On the equivalence betw een non-negative matrix factorization and proba- bilistic latent semantic indexing

Chris Ding, Tao Li, and W ei Peng. On the equivalence betw een non-negative matrix factorization and proba- bilistic latent semantic indexing. Computational Statistics & Data Analysis , 52(8):3913–3927, 2008

2008

[25] [25]

Benchmarking optim ization software with performance proﬁles

Elizabeth D Dolan and Jorge J Mor´ e. Benchmarking optim ization software with performance proﬁles. Math- ematical Programming, 91(2):201–213, 2002

2002

[26] [26]

Robust nonnegative matrix factorization via half-quadratic minimiza- tion

Liang Du, Xuan Li, and Yi-Dong Shen. Robust nonnegative matrix factorization via half-quadratic minimiza- tion. In 2012 IEEE 12th International Conference on Data Mining , pages 201–210. IEEE, 2012

2012

[27] [27]

Adaptive subg radient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subg radient methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12(7), 2011

2011

[28] [28]

Lever aging two reference functions in block Bregman proximal gradient descent for non-convex and non-lipschit z problems

Tianxiang Gao, Songtao Lu, Jia Liu, and Chris Chu. Lever aging two reference functions in block Bregman proximal gradient descent for non-convex and non-lipschit z problems. arXiv preprint arXiv:1912.07527 , 2019

arXiv 1912

[29] [29]

On the convergence of Randomized Bregman Coordinate Descent for Non-Lipschitz Composite Problems

Tianxiang Gao, Songtao Lu, Jia Liu, and Chris Chu. On the convergence of Randomized Bregman Coordinate Descent for Non-Lipschitz Composite Problems. In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 5549–5553, 2021

2021

[30] [30]

Nonnegative Matrix Factorization

Nicolas Gillis. Nonnegative Matrix Factorization . SIAM, Philadelphia, 2020

2020

[31] [31]

Complexity o f a class of ﬁrst-order objective-function-free optimization algorithms

Serge Gratton, Sadok Jerad, and Ph L Toint. Complexity o f a class of ﬁrst-order objective-function-free optimization algorithms. Optimization Methods and Software , pages 1–31, 2024

2024

[32] [32]

Complexity a nd performance for two classes of noise-tolerant ﬁrst-order algorithms

Serge Gratton, Sadok Jerad, and Ph L Toint. Complexity a nd performance for two classes of noise-tolerant ﬁrst-order algorithms. Optimization Methods and Software , pages 1–27, 2025

2025

[33] [33]

Param etric complexity analysis for a class of ﬁrst-order adagrad-like algorithms

Serge Gratton, Sadok Jerad, and Philippe L Toint. Param etric complexity analysis for a class of ﬁrst-order adagrad-like algorithms. arXiv preprint arXiv:2203.01647 , 2022

arXiv 2022

[34] [34]

Compl exity of adagrad and other ﬁrst-order methods for nonconvex optimization problems with bounds constraints

Serge Gratton, Sadok Jerad, and Philippe L Toint. Compl exity of adagrad and other ﬁrst-order methods for nonconvex optimization problems with bounds constraints. arXiv preprint arXiv:2406.15793 , 2024

arXiv 2024

[35] [35]

S2MPJ and CUTEst optimizat ion problems for Matlab, Python and Julia

Serge Gratton and Ph L Toint. S2MPJ and CUTEst optimizat ion problems for Matlab, Python and Julia. Optimization Methods and Software , 40(4):871–903, 2025

2025

[36] [36]

A uniﬁed convergence theor y for adaptive ﬁrst-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampo o and muon

Serge Gratton and Ph L Toint. A uniﬁed convergence theor y for adaptive ﬁrst-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampo o and muon. arXiv preprint arXiv:2604.17423 , 2026. 26

Pith/arXiv arXiv 2026

[37] [37]

Stochastic converg ence of parallel asynchronous adaptive ﬁrst-order methods

Serge Gratton and Philippe L Toint. Stochastic converg ence of parallel asynchronous adaptive ﬁrst-order methods. arXiv preprint arXiv:2606.01787 , 2026

Pith/arXiv arXiv 2026

[38] [38]

On the convergence o f the block nonlinear Gauss-Seidel method under convex constraints

Luigi Grippo and Marco Sciandrone. On the convergence o f the block nonlinear Gauss-Seidel method under convex constraints. Operations Research Letters, 26(3):127–136, 2000

2000

[39] [39]

MahNMF: Manhattan non-negative matrix factorization

Naiyang Guan, Dacheng Tao, Zhigang Luo, and John Shawe- Taylor. MahNMF: Manhattan non-negative matrix factorization. arXiv preprint arXiv:1207.3438 , 2012

Pith/arXiv arXiv 2012

[40] [40]

Non-negative matrix factorization for face recognition

David Guillamet and Jordi Vitria. Non-negative matrix factorization for face recognition. In Catalonian Conference on Artiﬁcial Intelligence , pages 336–344. Springer, 2002

2002

[41] [41]

A modiﬁed Huber nonnegative matrix factorization algorithm for hyperspectral unmixing

Ziyang Guo, Anyou Min, Bing Yang, Junhong Chen, and Hong Li. A modiﬁed Huber nonnegative matrix factorization algorithm for hyperspectral unmixing. IEEE Journal of Selected Topics in Applied Earth Obser- vations and Remote Sensing , 14:5559–5571, 2021

2021

[42] [42]

Shampoo: P reconditioned stochastic tensor optimization

Vineet Gupta, Tomer Koren, and Yoram Singer. Shampoo: P reconditioned stochastic tensor optimization. In International Conference on Machine Learning , pages 1842–1850. PMLR, 2018

2018

[43] [43]

Fastest rates for s tochastic mirror descent methods

Filip Hanzely and Peter Richt´ arik. Fastest rates for s tochastic mirror descent methods. Computational Opti- mization and Applications , 79(3):717–766, 2021

2021

[44] [44]

Accelera ted Bregman proximal gradient methods for relatively smooth convex optimization

Filip Hanzely, Peter Richtarik, and Lin Xiao. Accelera ted Bregman proximal gradient methods for relatively smooth convex optimization. Computational Optimization and Applications , 79(2):405–440, 2021

2021

[45] [45]

Block majorization minimization with extrapolation and application to NMF

Le Thi Khanh Hien, Valentin Leplat, and Nicolas Gillis. Block majorization minimization with extrapolation and application to NMF. SIAM Journal on Mathematics of Data Science , 7(3):1292–1314, 2025

2025

[46] [46]

Iteration complexity analysis of block coordinate descent methods

Mingyi Hong, Xiangfeng W ang, Meisam Razaviyayn, and Zh i-Quan Luo. Iteration complexity analysis of block coordinate descent methods. Mathematical Programming, 163(1):85–114, 2017

2017

[47] [47]

Muon: An optimizer for hidden layers in neural networks, 202 4

Keller Jordan, Yuchen Jin, Vlado Boza, Jiacheng You, Fr anz Cesista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks, 202 4

[48] [48]

Robust ℓ1-norm factorization in the presence of outliers and missing data by alternative convex programming

Qifa Ke and Takeo Kanade. Robust ℓ1-norm factorization in the presence of outliers and missing data by alternative convex programming. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 739–746, 2005

2005

[49] [49]

Adam: a method for stochastic optimi zation

Diederik P Kingma. Adam: a method for stochastic optimi zation. arXiv preprint arXiv:1412.6980 , 2014

Pith/arXiv arXiv 2014

[50] [50]

Robust nonneg ative matrix factorization using l21-norm

Deguang Kong, Chris Ding, and Heng Huang. Robust nonneg ative matrix factorization using l21-norm. In Proceedings of the 20th ACM International Conference on Inf ormation and Knowledge Management , pages 673–682, 2011

2011

[51] [51]

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problem s

Puya Latafat, Andreas Themelis, and Panagiotis Patrin os. Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problem s. Mathematical Programming, 193(1):195–224, 2022

2022

[52] [52]

Gradient-based learning applied to document recognition

Yann LeCun, L´ eon Bottou, Yoshua Bengio, and Patrick Ha ﬀner. Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11):2278–2324, 1998

1998

[53] [53]

Random permutation s ﬁx a worst case for cyclic coordinate descent

Ching-Pei Lee and Stephen J W right. Random permutation s ﬁx a worst case for cyclic coordinate descent. IMA Journal of Numerical Analysis , 39(3):1246–1275, 2019

2019

[54] [54]

Learning the parts of objects by non-negative matrix factorization

Daniel D Lee and H Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. nature, 401(6755):788–791, 1999

1999

[55] [55]

Eﬃcient accelerated coor dinate descent methods and faster algorithms for solving linear systems

Yin Tat Lee and Aaron Sidford. Eﬃcient accelerated coor dinate descent methods and faster algorithms for solving linear systems. In 2013 IEEE 54th Annual Symposium on Foundations of Computer S cience, pages 147–156. IEEE, 2013

2013

[56] [56]

Coordinate-w ise power method

Qi Lei, Kai Zhong, and Inderjit S Dhillon. Coordinate-w ise power method. Advances in neural information processing systems, 29, 2016

2016

[57] [57]

On faster convergence of cyclic block coordinate descent-type methods for strongly convex minim ization

Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, and Mingyi Ho ng. On faster convergence of cyclic block coordinate descent-type methods for strongly convex minim ization. Journal of Machine Learning Research , 18(184):1–24, 2018

2018

[58] [58]

An improved Adam optimization algorithm combining adaptive coeﬃcients and composite gra dients based on randomized block coordinate descent

Miaomiao Liu, Dan Yao, Zhigang Liu, Jingfeng Guo, and Ji ng Chen. An improved Adam optimization algorithm combining adaptive coeﬃcients and composite gra dients based on randomized block coordinate descent. Computational Intelligence and Neuroscience , 2023(1):4765891, 2023

2023

[59] [59]

Randomized block coordinate n on-monotone gradient method for a class of nonlinear programming

Zhaosong Lu and Lin Xiao. Randomized block coordinate n on-monotone gradient method for a class of nonlinear programming. arXiv preprint arXiv:1306.5918 , 1273, 2013

Pith/arXiv arXiv 2013

[60] [60]

On the complexity analysis of r andomized block-coordinate descent methods

Zhaosong Lu and Lin Xiao. On the complexity analysis of r andomized block-coordinate descent methods. Mathematical Programming, 152(1):615–642, 2015

2015

[61] [61]

On the convergence of the co ordinate descent method for convex diﬀerentiable minimization

Zhi-Quan Luo and Paul Tseng. On the convergence of the co ordinate descent method for convex diﬀerentiable minimization. Journal of Optimization Theory and Applications , 72(1):7–35, 1992

1992

[62] [62]

A signal processi ng perspective on hyperspectral unmixing: Insights from remote sensing

Wing-Kin Ma, Jos´ e M Bioucas-Dias, Tsung-Han Chan, Nic olas Gillis, Paul Gader, Antonio J Plaza, Arul- Murugan Ambikapathi, and Chong-Yung Chi. A signal processi ng perspective on hyperspectral unmixing: Insights from remote sensing. IEEE Signal Processing Magazine , 31(1):67–81, 2013. 27

2013

[63] [63]

Adaptive bound optimization for online convex optimization

H Brendan McMahan and Matthew Streeter. Adaptive bound optimization for online convex optimization. arXiv preprint arXiv:1002.4908 , 2010

Pith/arXiv arXiv 2010

[64] [64]

Eﬃciency of stochast ic coordinate proximal gradient methods on nonsep- arable composite optimization

Ion Necoara and Flavia Chorobura. Eﬃciency of stochast ic coordinate proximal gradient methods on nonsep- arable composite optimization. Mathematics of Operations Research , 50(2):993–1018, 2025

2025

[65] [65]

Parallel random coordi nate descent method for composite minimization: Convergence analysis and error bounds

Ion Necoara and Dragos Clipici. Parallel random coordi nate descent method for composite minimization: Convergence analysis and error bounds. SIAM Journal on Optimization , 26(1):197–226, 2016

2016

[66] [66]

Eﬃciency of coordinate descent methods on huge-scale optimization problems

Yu Nesterov. Eﬃciency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization , 22(2):341–362, 2012

2012

[67] [67]

Eﬃciency of the ac celerated coordinate descent method on structured optimization problems

Yurii Nesterov and Sebastian U Stich. Eﬃciency of the ac celerated coordinate descent method on structured optimization problems. SIAM Journal on Optimization , 27(1):110–123, 2017

2017

[68] [68]

Let’s ma ke block coordinate descent converge faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, an d Superlinear Convergence

Julie Nutini, Issam Laradji, and Mark Schmidt. Let’s ma ke block coordinate descent converge faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, an d Superlinear Convergence. Journal of Machine Learning Research, 23(131):1–74, 2022

2022

[69] [69]

Coordinate descent converges faster with the Gauss-Southwell rule than random selection

Julie Nutini, Mark Schmidt, Issam Laradji, Michael Fri edlander, and Hoyt Koepke. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In International Conference on Machine Learning, pages 1632–1641. PMLR, 2015

2015

[70] [70]

Eﬃcient random coordi nate descent algorithms for large-scale structured nonconvex optimization

Andrei Patrascu and Ion Necoara. Eﬃcient random coordi nate descent algorithms for large-scale structured nonconvex optimization. Journal of Global Optimization , 61(1):19–46, 2015

2015

[71] [71]

Coordinate friendly structures, algorithms and applications

Zhimin Peng, Tianyu W u, Yangyang Xu, Ming Yan, and W otao Yin. Coordinate friendly structures, algorithms and applications. arXiv preprint arXiv:1601.00863 , 2016

Pith/arXiv arXiv 2016

[72] [72]

prunadag: an adaptive pruning-aware gradient method

Margherita Porcelli, Giovanni Seraghiti, and Philipp e L Toint. prunadag: an adaptive pruning-aware gradient method. Computational Optimization and Applications , 2026

2026

[73] [73]

A uni ﬁed convergence analysis of block successive minimization methods for nonsmooth optimization

Meisam Razaviyayn, Mingyi Hong, and Zhi-Quan Luo. A uni ﬁed convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization , 23(2):1126–1153, 2013

2013

[74] [74]

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

Peter Richt´ arik and Martin Tak´ aˇ c. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, 144(1):1–38, 2014

2014

[75] [75]

On the ﬁnite time convergen ce of cyclic coordinate descent methods

Ankan Saha and Ambuj Tewari. On the ﬁnite time convergen ce of cyclic coordinate descent methods. arXiv preprint arXiv:1005.2146 , 2010

Pith/arXiv arXiv 2010

[76] [76]

Nonnegative matrix factorization in the component-wise l1 norm for sparse data

Giovanni Seraghiti, K´ evin Dubrulle, Arnaud Vandaele , and Nicolas Gillis. Nonnegative matrix factorization in the component-wise l1 norm for sparse data. arXiv preprint arXiv:2603.29715 , 2026

arXiv 2026

[77] [77]

Document clustering using non- negative matrix factorization

Farial Shahnaz, Michael W Berry, V Paul Pauca, and Rober t J Plemmons. Document clustering using non- negative matrix factorization. Information Processing & Management , 42(2):373–386, 2006

2006

[78] [78]

Stochastic meth ods for ℓ1 regularized loss minimization

Shai Shalev-Shwartz and Ambuj Tewari. Stochastic meth ods for ℓ1 regularized loss minimization. In Proceed- ings of the 26th Annual International Conference on Machine Learning, pages 929–936, 2009

2009

[79] [79]

A primer on coordinate descent algo- rithms

Hao-Jun Michael Shi, Shenyinying Tu, Yangyang Xu, and W otao Yin. A primer on coordinate descent algo- rithms. arXiv preprint arXiv:1610.00040 , 2016

Pith/arXiv arXiv 2016

[80] [80]

Cyclic coordin ate dual averaging with extrapolation

Chaobing Song and Jelena Diakonikolas. Cyclic coordin ate dual averaging with extrapolation. SIAM Journal on Optimization , 33(4):2935–2961, 2023

2023