pith. sign in

arxiv: 2606.11791 · v1 · pith:AHTAYTDVnew · submitted 2026-06-10 · 🧮 math.OC

bAdag: an adaptive block coordinate gradient method for smooth nonconvex functions

Pith reviewed 2026-06-27 08:58 UTC · model grok-4.3

classification 🧮 math.OC
keywords block coordinate gradientadaptive methodsnonconvex optimizationAdaGradconvergence ratesGauss-SouthwellOFFO methods
0
0 comments X

The pith

bAdag adapts step sizes from cumulative block gradients to prove sublinear ergodic convergence for smooth nonconvex minimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces bAdag, a block coordinate gradient method modeled on AdaGrad that sets each step size from the running sum of squared block gradients instead of full gradients. It proves that the ergodic average of squared gradient norms reaches zero at a sublinear rate for smooth nonconvex objectives. The same rates hold under three block selection rules: cyclic, uniform random, and Gauss-Southwell. The method requires no objective evaluations and extends to box-constrained problems. This supplies convergence guarantees for adaptive block methods on large-scale nonconvex tasks where full gradients are expensive.

Core claim

bAdag computes an adaptive step size from the cumulative sum of squared block gradients at each iteration and establishes that the ergodic average of the squared gradient norms converges to zero at a sublinear rate when the objective is smooth and possibly nonconvex, provided the gradient satisfies block Lipschitz continuity; the result holds for cyclic, uniform random, and Gauss-Southwell selection and carries over to box-constrained problems.

What carries the argument

Adaptive step size computed from the cumulative sum of squared norms of the selected block gradients, replacing the full-gradient accumulation of standard AdaGrad.

If this is right

  • Ergodic sublinear rates hold uniformly for cyclic, uniform random, and Gauss-Southwell block selection.
  • The same rates and analysis extend to box-constrained smooth problems.
  • The algorithm never evaluates the objective function value.
  • Numerical experiments on synthetic and real-world instances confirm the predicted behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The uniform rate across selection rules suggests that practical speed differences among the three rules arise mainly from implementation cost rather than asymptotic guarantees.
  • The OFFO property opens the door to applying the same cumulative-block mechanism inside other first-order frameworks that avoid function evaluations.
  • If the block Lipschitz assumption can be replaced by a weaker local condition, the method could apply to a broader class of nonconvex problems without changing the algorithm.

Load-bearing premise

The gradient satisfies block Lipschitz continuity.

What would settle it

A smooth nonconvex function whose gradient meets the block Lipschitz condition, yet the ergodic average of squared gradient norms fails to approach zero when bAdag is run with any of the three selection rules.

Figures

Figures reproduced from arXiv: 2606.11791 by Giovanni Seraghiti.

Figure 1
Figure 1. Figure 1: Performance profile according the definition given in [26]. Th [PITH_FULL_IMAGE:figures/full_fig_p020_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Huber-NMF objective function values along the iterations [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual reconstruction of 64 randomly selected images fro [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
read the original abstract

A new Block Coordinate Gradient (BCG) method, dubbed bAdag, for smooth, nonconvex minimization problem is proposed; it falls in the class of Objective Function Free Optimization (OFFO) methods, and it is based on the AdaGrad algorithm. At each iteration, our method computes an adaptive step size based on the cumulative sum of block gradients, instead of full gradients as in AdaGrad-type methods. We prove ergodic, sublinear convergence rates for the bAdag algorithm when minimizing a smooth, possibly nonconvex objective under the (block) Lipschitz continuity assumption on the gradient. Our theory covers three widely popular block selection strategies: the Cyclic (C) rule, Uniform Random selection (UR), and the greedy Gauss-Southwell (GS) rule. We also extend our algorithm and its convergence theory to box-constrained smooth functions. We validate the proposed algorithms through synthetic and real-world experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes bAdag, an adaptive block coordinate gradient method based on AdaGrad for minimizing smooth nonconvex functions. It proves ergodic sublinear convergence rates under the standard block Lipschitz continuity assumption on the gradient, covering the cyclic, uniform random, and Gauss-Southwell block selection rules. The theory and algorithm are extended to box-constrained problems, with validation on synthetic and real-world experiments.

Significance. If the stated rates hold, the work supplies convergence guarantees for an adaptive BCG method in the nonconvex regime under the block-Lipschitz assumption, extending to three common selection strategies and box constraints. This is a useful addition to the OFFO literature, where adaptive step-size constructions with block-coordinate analysis remain relatively sparse.

minor comments (3)
  1. [§2] §2, definition of the cumulative block-gradient accumulator: the notation G_k^{(i)} is introduced without an explicit recursive formula; adding the update equation would improve readability.
  2. [Theorem 3.1] Theorem 3.1 (ergodic rate): the dependence of the constant on the block-Lipschitz parameter L_i is stated but the explicit scaling with the number of blocks is not highlighted; a remark clarifying this would help.
  3. [§5] Numerical section: the comparison baselines do not include a non-adaptive BCG method with the same block selection rules; adding one would strengthen the empirical claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript on bAdag and the recommendation for minor revision. The referee's summary correctly reflects the paper's contributions regarding the adaptive block coordinate method, convergence theory under block Lipschitz assumptions, coverage of multiple selection rules, extension to box constraints, and experimental validation.

Circularity Check

0 steps flagged

No significant circularity; rates derived from external Lipschitz assumption

full rationale

The paper states an explicit external assumption (block Lipschitz continuity of the gradient) and derives ergodic sublinear rates for bAdag under that assumption for standard selection rules (C/UR/GS) and a box-constraint extension. No step reduces a claimed result to a quantity defined from the algorithm outputs themselves, no fitted parameter is relabeled as a prediction, and no load-bearing self-citation chain is present in the abstract or description. The derivation chain is therefore self-contained against the stated assumption.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of block Lipschitz continuity of the gradient; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption The gradient of the objective is (block) Lipschitz continuous
    Invoked explicitly in the abstract as the condition under which the convergence rates hold.

pith-pipeline@v0.9.1-grok · 5680 in / 1151 out tokens · 27117 ms · 2026-06-27T08:58:58.850076+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

96 extracted references · 1 canonical work pages

  1. [1]

    Musco, C

    Mushroom. UCI Machine Learning Repository, 1981. DOI: h ttps://doi.org/10.24432/C5959T. [2] . UCI machine learning repository, 2013

  2. [2]

    An accelerated coordinate g radient descent algorithm for non-separable composite optimization

    Aviad Aberdam and Amir Beck. An accelerated coordinate g radient descent algorithm for non-separable composite optimization. Journal of Optimization Theory and Applications , 193(1):219–246, 2022

  3. [3]

    Even faster accelerated coordinate descent using non-uniform sampling

    Zeyuan Allen-Zhu, Zheng Qu, Peter Richt´ arik, and Yang Y uan. Even faster accelerated coordinate descent using non-uniform sampling. In International Conference on Machine Learning , pages 1110–1119. PMLR, 2016

  4. [4]

    Accelerating n onnegative matrix factorization algorithms using extrapolation

    Andersen Man Shun Ang and Nicolas Gillis. Accelerating n onnegative matrix factorization algorithms using extrapolation. Neural Computation , 31(2):417–439, 2019

  5. [5]

    Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-/suppress lojasiewicz inequality

    H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-/suppress lojasiewicz inequality. Mathematics of Operations Research , 35(2):438–457, 2010

  6. [6]

    Con vergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods

    Hedy Attouch, J´ erˆ ome Bolte, and Benar Fux Svaiter. Con vergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical programming, 137(1):91–129, 2013

  7. [7]

    Optimisation

    Alfred Auslender. Optimisation. M´ ethodes num´ eriques, 1976

  8. [8]

    Instance-wise distributionally robust nonnegative matrix factorization

    W afa Barkhoda, Amjad Seyedi, Nicolas Gillis, and Fardin Akhlaghian Tab. Instance-wise distributionally robust nonnegative matrix factorization. Pattern Recognition, 169:111732, 2026

  9. [9]

    On the convergence of alternating minimizat ion for convex programming with applications to iteratively reweighted least squares and decomposition sc hemes

    Amir Beck. On the convergence of alternating minimizat ion for convex programming with applications to iteratively reweighted least squares and decomposition sc hemes. SIAM Journal on Optimization , 25(1):185– 209, 2015. 25

  10. [10]

    On the convergence of b lock coordinate descent type methods

    Amir Beck and Luba Tetruashvili. On the convergence of b lock coordinate descent type methods. SIAM journal on Optimization , 23(4):2037–2060, 2013

  11. [11]

    Fast stochastic second-order adagrad for nonconvex bound-constrained optimization

    Stefania Bellavia, Serge Gratton, Benedetta Morini, a nd Ph L Toint. Fast stochastic second-order adagrad for nonconvex bound-constrained optimization. arXiv preprint arXiv:2505.06374 , 2025

  12. [12]

    Nonlinear programming

    Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society , 48(3):334–334, 1997

  13. [13]

    Hyperspectral unmixing overview: Geometri cal, statistical, and sparse regression-based ap- proaches

    Jos´ e M Bioucas-Dias, Antonio Plaza, Nicolas Dobigeon , Mario Parente, Qian Du, Paul Gader, and Joce- lyn Chanussot. Hyperspectral unmixing overview: Geometri cal, statistical, and sparse regression-based ap- proaches. IEEE Journal of Selected Topics in Applied Earth Observatio ns and Remote Sensing , 5(2):354–379, 2012

  14. [14]

    Proximal alternating linearized minimization for nonconvex and nonsmooth problems

    J´ erˆ ome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1):459–494, 2014

  15. [15]

    A cyclic block coordinate descent method with gener- alized gradient projections

    Silvia Bonettini, Marco Prato, and Simone Rebegoldi. A cyclic block coordinate descent method with gener- alized gradient projections. Applied Mathematics and Computation , 286:288–300, 2016

  16. [16]

    Cyclic block coordinate descent with variance reduction for composite nonconvex optimization

    Xufeng Cai, Chaobing Song, Stephen W right, and Jelena D iakonikolas. Cyclic block coordinate descent with variance reduction for composite nonconvex optimization. In International Conference on Machine Learning , pages 3469–3494. PMLR, 2023

  17. [17]

    E nhancing sparsity by reweighted ℓ1 minimiza- tion

    Emmanuel J Candes, Michael B W akin, and Stephen P Boyd. E nhancing sparsity by reweighted ℓ1 minimiza- tion. Journal of Fourier Analysis and Applications , 14(5):877–905, 2008

  18. [18]

    LIBSVM: A library fo r support vector machines

    Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library fo r support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) , 2(3):1–27, 2011

  19. [19]

    Trust region methods

    Andrew R Conn, Nicholas IM Gould, and Philippe L Toint. Trust region methods . SIAM, 2000

  20. [20]

    Global convergenc e of arbitrary-block gradient methods for generalized Polyak- Lojasiewicz functions

    Dominik Csiba and Peter Richt´ arik. Global convergenc e of arbitrary-block gradient methods for generalized Polyak- Lojasiewicz functions. arXiv preprint arXiv:1709.03014 , 2017

  21. [21]

    A simple convergence proof of Adam and Adagrad

    Alexandre D´ efossez, Leon Bottou, Francis Bach, and Ni colas Usunier. A simple convergence proof of Adam and Adagrad. Transactions on Machine Learning Research , 2022

  22. [22]

    Nearest neighbor based greedy coordinate descent

    Inderjit Dhillon, Pradeep Ravikumar, and Ambuj Tewari . Nearest neighbor based greedy coordinate descent. Advances in Neural Information Processing Systems , 24, 2011

  23. [23]

    Alternatin g randomized block coordinate descent

    Jelena Diakonikolas and Lorenzo Orecchia. Alternatin g randomized block coordinate descent. In International Conference on Machine Learning , pages 1224–1232. PMLR, 2018

  24. [24]

    On the equivalence betw een non-negative matrix factorization and proba- bilistic latent semantic indexing

    Chris Ding, Tao Li, and W ei Peng. On the equivalence betw een non-negative matrix factorization and proba- bilistic latent semantic indexing. Computational Statistics & Data Analysis , 52(8):3913–3927, 2008

  25. [25]

    Benchmarking optim ization software with performance profiles

    Elizabeth D Dolan and Jorge J Mor´ e. Benchmarking optim ization software with performance profiles. Math- ematical Programming, 91(2):201–213, 2002

  26. [26]

    Robust nonnegative matrix factorization via half-quadratic minimiza- tion

    Liang Du, Xuan Li, and Yi-Dong Shen. Robust nonnegative matrix factorization via half-quadratic minimiza- tion. In 2012 IEEE 12th International Conference on Data Mining , pages 201–210. IEEE, 2012

  27. [27]

    Adaptive subg radient methods for online learning and stochastic optimization

    John Duchi, Elad Hazan, and Yoram Singer. Adaptive subg radient methods for online learning and stochastic optimization. Journal of Machine Learning Research , 12(7), 2011

  28. [28]

    Lever aging two reference functions in block Bregman proximal gradient descent for non-convex and non-lipschit z problems

    Tianxiang Gao, Songtao Lu, Jia Liu, and Chris Chu. Lever aging two reference functions in block Bregman proximal gradient descent for non-convex and non-lipschit z problems. arXiv preprint arXiv:1912.07527 , 2019

  29. [29]

    On the convergence of Randomized Bregman Coordinate Descent for Non-Lipschitz Composite Problems

    Tianxiang Gao, Songtao Lu, Jia Liu, and Chris Chu. On the convergence of Randomized Bregman Coordinate Descent for Non-Lipschitz Composite Problems. In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 5549–5553, 2021

  30. [30]

    Nonnegative Matrix Factorization

    Nicolas Gillis. Nonnegative Matrix Factorization . SIAM, Philadelphia, 2020

  31. [31]

    Complexity o f a class of first-order objective-function-free optimization algorithms

    Serge Gratton, Sadok Jerad, and Ph L Toint. Complexity o f a class of first-order objective-function-free optimization algorithms. Optimization Methods and Software , pages 1–31, 2024

  32. [32]

    Complexity a nd performance for two classes of noise-tolerant first-order algorithms

    Serge Gratton, Sadok Jerad, and Ph L Toint. Complexity a nd performance for two classes of noise-tolerant first-order algorithms. Optimization Methods and Software , pages 1–27, 2025

  33. [33]

    Param etric complexity analysis for a class of first-order adagrad-like algorithms

    Serge Gratton, Sadok Jerad, and Philippe L Toint. Param etric complexity analysis for a class of first-order adagrad-like algorithms. arXiv preprint arXiv:2203.01647 , 2022

  34. [34]

    Compl exity of adagrad and other first-order methods for nonconvex optimization problems with bounds constraints

    Serge Gratton, Sadok Jerad, and Philippe L Toint. Compl exity of adagrad and other first-order methods for nonconvex optimization problems with bounds constraints. arXiv preprint arXiv:2406.15793 , 2024

  35. [35]

    S2MPJ and CUTEst optimizat ion problems for Matlab, Python and Julia

    Serge Gratton and Ph L Toint. S2MPJ and CUTEst optimizat ion problems for Matlab, Python and Julia. Optimization Methods and Software , 40(4):871–903, 2025

  36. [36]

    A unified convergence theor y for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampo o and muon

    Serge Gratton and Ph L Toint. A unified convergence theor y for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampo o and muon. arXiv preprint arXiv:2604.17423 , 2026. 26

  37. [37]

    Stochastic converg ence of parallel asynchronous adaptive first-order methods

    Serge Gratton and Philippe L Toint. Stochastic converg ence of parallel asynchronous adaptive first-order methods. arXiv preprint arXiv:2606.01787 , 2026

  38. [38]

    On the convergence o f the block nonlinear Gauss-Seidel method under convex constraints

    Luigi Grippo and Marco Sciandrone. On the convergence o f the block nonlinear Gauss-Seidel method under convex constraints. Operations Research Letters, 26(3):127–136, 2000

  39. [39]

    MahNMF: Manhattan non-negative matrix factorization

    Naiyang Guan, Dacheng Tao, Zhigang Luo, and John Shawe- Taylor. MahNMF: Manhattan non-negative matrix factorization. arXiv preprint arXiv:1207.3438 , 2012

  40. [40]

    Non-negative matrix factorization for face recognition

    David Guillamet and Jordi Vitria. Non-negative matrix factorization for face recognition. In Catalonian Conference on Artificial Intelligence , pages 336–344. Springer, 2002

  41. [41]

    A modified Huber nonnegative matrix factorization algorithm for hyperspectral unmixing

    Ziyang Guo, Anyou Min, Bing Yang, Junhong Chen, and Hong Li. A modified Huber nonnegative matrix factorization algorithm for hyperspectral unmixing. IEEE Journal of Selected Topics in Applied Earth Obser- vations and Remote Sensing , 14:5559–5571, 2021

  42. [42]

    Shampoo: P reconditioned stochastic tensor optimization

    Vineet Gupta, Tomer Koren, and Yoram Singer. Shampoo: P reconditioned stochastic tensor optimization. In International Conference on Machine Learning , pages 1842–1850. PMLR, 2018

  43. [43]

    Fastest rates for s tochastic mirror descent methods

    Filip Hanzely and Peter Richt´ arik. Fastest rates for s tochastic mirror descent methods. Computational Opti- mization and Applications , 79(3):717–766, 2021

  44. [44]

    Accelera ted Bregman proximal gradient methods for relatively smooth convex optimization

    Filip Hanzely, Peter Richtarik, and Lin Xiao. Accelera ted Bregman proximal gradient methods for relatively smooth convex optimization. Computational Optimization and Applications , 79(2):405–440, 2021

  45. [45]

    Block majorization minimization with extrapolation and application to NMF

    Le Thi Khanh Hien, Valentin Leplat, and Nicolas Gillis. Block majorization minimization with extrapolation and application to NMF. SIAM Journal on Mathematics of Data Science , 7(3):1292–1314, 2025

  46. [46]

    Iteration complexity analysis of block coordinate descent methods

    Mingyi Hong, Xiangfeng W ang, Meisam Razaviyayn, and Zh i-Quan Luo. Iteration complexity analysis of block coordinate descent methods. Mathematical Programming, 163(1):85–114, 2017

  47. [47]

    Muon: An optimizer for hidden layers in neural networks, 202 4

    Keller Jordan, Yuchen Jin, Vlado Boza, Jiacheng You, Fr anz Cesista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks, 202 4

  48. [48]

    Robust ℓ1-norm factorization in the presence of outliers and missing data by alternative convex programming

    Qifa Ke and Takeo Kanade. Robust ℓ1-norm factorization in the presence of outliers and missing data by alternative convex programming. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 739–746, 2005

  49. [49]

    Adam: a method for stochastic optimi zation

    Diederik P Kingma. Adam: a method for stochastic optimi zation. arXiv preprint arXiv:1412.6980 , 2014

  50. [50]

    Robust nonneg ative matrix factorization using l21-norm

    Deguang Kong, Chris Ding, and Heng Huang. Robust nonneg ative matrix factorization using l21-norm. In Proceedings of the 20th ACM International Conference on Inf ormation and Knowledge Management , pages 673–682, 2011

  51. [51]

    Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problem s

    Puya Latafat, Andreas Themelis, and Panagiotis Patrin os. Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problem s. Mathematical Programming, 193(1):195–224, 2022

  52. [52]

    Gradient-based learning applied to document recognition

    Yann LeCun, L´ eon Bottou, Yoshua Bengio, and Patrick Ha ffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11):2278–2324, 1998

  53. [53]

    Random permutation s fix a worst case for cyclic coordinate descent

    Ching-Pei Lee and Stephen J W right. Random permutation s fix a worst case for cyclic coordinate descent. IMA Journal of Numerical Analysis , 39(3):1246–1275, 2019

  54. [54]

    Learning the parts of objects by non-negative matrix factorization

    Daniel D Lee and H Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. nature, 401(6755):788–791, 1999

  55. [55]

    Efficient accelerated coor dinate descent methods and faster algorithms for solving linear systems

    Yin Tat Lee and Aaron Sidford. Efficient accelerated coor dinate descent methods and faster algorithms for solving linear systems. In 2013 IEEE 54th Annual Symposium on Foundations of Computer S cience, pages 147–156. IEEE, 2013

  56. [56]

    Coordinate-w ise power method

    Qi Lei, Kai Zhong, and Inderjit S Dhillon. Coordinate-w ise power method. Advances in neural information processing systems, 29, 2016

  57. [57]

    On faster convergence of cyclic block coordinate descent-type methods for strongly convex minim ization

    Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, and Mingyi Ho ng. On faster convergence of cyclic block coordinate descent-type methods for strongly convex minim ization. Journal of Machine Learning Research , 18(184):1–24, 2018

  58. [58]

    An improved Adam optimization algorithm combining adaptive coefficients and composite gra dients based on randomized block coordinate descent

    Miaomiao Liu, Dan Yao, Zhigang Liu, Jingfeng Guo, and Ji ng Chen. An improved Adam optimization algorithm combining adaptive coefficients and composite gra dients based on randomized block coordinate descent. Computational Intelligence and Neuroscience , 2023(1):4765891, 2023

  59. [59]

    Randomized block coordinate n on-monotone gradient method for a class of nonlinear programming

    Zhaosong Lu and Lin Xiao. Randomized block coordinate n on-monotone gradient method for a class of nonlinear programming. arXiv preprint arXiv:1306.5918 , 1273, 2013

  60. [60]

    On the complexity analysis of r andomized block-coordinate descent methods

    Zhaosong Lu and Lin Xiao. On the complexity analysis of r andomized block-coordinate descent methods. Mathematical Programming, 152(1):615–642, 2015

  61. [61]

    On the convergence of the co ordinate descent method for convex differentiable minimization

    Zhi-Quan Luo and Paul Tseng. On the convergence of the co ordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications , 72(1):7–35, 1992

  62. [62]

    A signal processi ng perspective on hyperspectral unmixing: Insights from remote sensing

    Wing-Kin Ma, Jos´ e M Bioucas-Dias, Tsung-Han Chan, Nic olas Gillis, Paul Gader, Antonio J Plaza, Arul- Murugan Ambikapathi, and Chong-Yung Chi. A signal processi ng perspective on hyperspectral unmixing: Insights from remote sensing. IEEE Signal Processing Magazine , 31(1):67–81, 2013. 27

  63. [63]

    Adaptive bound optimization for online convex optimization

    H Brendan McMahan and Matthew Streeter. Adaptive bound optimization for online convex optimization. arXiv preprint arXiv:1002.4908 , 2010

  64. [64]

    Efficiency of stochast ic coordinate proximal gradient methods on nonsep- arable composite optimization

    Ion Necoara and Flavia Chorobura. Efficiency of stochast ic coordinate proximal gradient methods on nonsep- arable composite optimization. Mathematics of Operations Research , 50(2):993–1018, 2025

  65. [65]

    Parallel random coordi nate descent method for composite minimization: Convergence analysis and error bounds

    Ion Necoara and Dragos Clipici. Parallel random coordi nate descent method for composite minimization: Convergence analysis and error bounds. SIAM Journal on Optimization , 26(1):197–226, 2016

  66. [66]

    Efficiency of coordinate descent methods on huge-scale optimization problems

    Yu Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization , 22(2):341–362, 2012

  67. [67]

    Efficiency of the ac celerated coordinate descent method on structured optimization problems

    Yurii Nesterov and Sebastian U Stich. Efficiency of the ac celerated coordinate descent method on structured optimization problems. SIAM Journal on Optimization , 27(1):110–123, 2017

  68. [68]

    Let’s ma ke block coordinate descent converge faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, an d Superlinear Convergence

    Julie Nutini, Issam Laradji, and Mark Schmidt. Let’s ma ke block coordinate descent converge faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, an d Superlinear Convergence. Journal of Machine Learning Research, 23(131):1–74, 2022

  69. [69]

    Coordinate descent converges faster with the Gauss-Southwell rule than random selection

    Julie Nutini, Mark Schmidt, Issam Laradji, Michael Fri edlander, and Hoyt Koepke. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In International Conference on Machine Learning, pages 1632–1641. PMLR, 2015

  70. [70]

    Efficient random coordi nate descent algorithms for large-scale structured nonconvex optimization

    Andrei Patrascu and Ion Necoara. Efficient random coordi nate descent algorithms for large-scale structured nonconvex optimization. Journal of Global Optimization , 61(1):19–46, 2015

  71. [71]

    Coordinate friendly structures, algorithms and applications

    Zhimin Peng, Tianyu W u, Yangyang Xu, Ming Yan, and W otao Yin. Coordinate friendly structures, algorithms and applications. arXiv preprint arXiv:1601.00863 , 2016

  72. [72]

    prunadag: an adaptive pruning-aware gradient method

    Margherita Porcelli, Giovanni Seraghiti, and Philipp e L Toint. prunadag: an adaptive pruning-aware gradient method. Computational Optimization and Applications , 2026

  73. [73]

    A uni fied convergence analysis of block successive minimization methods for nonsmooth optimization

    Meisam Razaviyayn, Mingyi Hong, and Zhi-Quan Luo. A uni fied convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization , 23(2):1126–1153, 2013

  74. [74]

    Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

    Peter Richt´ arik and Martin Tak´ aˇ c. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, 144(1):1–38, 2014

  75. [75]

    On the finite time convergen ce of cyclic coordinate descent methods

    Ankan Saha and Ambuj Tewari. On the finite time convergen ce of cyclic coordinate descent methods. arXiv preprint arXiv:1005.2146 , 2010

  76. [76]

    Nonnegative matrix factorization in the component-wise l1 norm for sparse data

    Giovanni Seraghiti, K´ evin Dubrulle, Arnaud Vandaele , and Nicolas Gillis. Nonnegative matrix factorization in the component-wise l1 norm for sparse data. arXiv preprint arXiv:2603.29715 , 2026

  77. [77]

    Document clustering using non- negative matrix factorization

    Farial Shahnaz, Michael W Berry, V Paul Pauca, and Rober t J Plemmons. Document clustering using non- negative matrix factorization. Information Processing & Management , 42(2):373–386, 2006

  78. [78]

    Stochastic meth ods for ℓ1 regularized loss minimization

    Shai Shalev-Shwartz and Ambuj Tewari. Stochastic meth ods for ℓ1 regularized loss minimization. In Proceed- ings of the 26th Annual International Conference on Machine Learning, pages 929–936, 2009

  79. [79]

    A primer on coordinate descent algo- rithms

    Hao-Jun Michael Shi, Shenyinying Tu, Yangyang Xu, and W otao Yin. A primer on coordinate descent algo- rithms. arXiv preprint arXiv:1610.00040 , 2016

  80. [80]

    Cyclic coordin ate dual averaging with extrapolation

    Chaobing Song and Jelena Diakonikolas. Cyclic coordin ate dual averaging with extrapolation. SIAM Journal on Optimization , 33(4):2935–2961, 2023

Showing first 80 references.