A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints
Pith reviewed 2026-05-24 09:20 UTC · model grok-4.3
The pith
OBCD updates k rows at a time by exactly solving small nonsmooth subproblems to reach global block-k stationary points with O(1/ε) iteration complexity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the OBCD scheme, by globally solving the nonsmooth subproblems that arise when k rows are updated under orthogonality constraints, produces limit points that are global block-k stationary points and reaches an ε-block-k stationary point after O(1/ε) iterations.
What carries the argument
The OBCD row-wise update that globally solves a small nonsmooth optimization problem under orthogonality constraints for any chosen block of k rows.
If this is right
- Row-wise orthogonal updates can reach any feasible point from any feasible initialization.
- Limit points of OBCD are global block-k stationary points that satisfy stronger optimality than standard critical points.
- An ε-block-k stationary point is found after O(1/ε) iterations.
- Under the Kurdyka–Lojasiewicz inequality a non-ergodic convergence rate holds.
Where Pith is reading between the lines
- If similar globally solvable subproblems exist for other matrix constraints, the same block-update completeness argument could apply.
- The stronger stationarity notion may translate into better empirical performance on downstream statistical-learning tasks that rely on orthogonal matrices.
- Breakpoint search techniques developed for the subproblems could be reused in other nonconvex composite problems that admit low-dimensional orthogonal blocks.
Load-bearing premise
The small nonsmooth optimization subproblems under orthogonality constraints that arise when updating k rows can be solved to global optimality using breakpoint search methods.
What would settle it
A concrete numerical instance in which breakpoint search returns a suboptimal subproblem solution and the overall algorithm fails to produce a global block-k stationary point, or an experiment showing that more than linear-in-1/ε iterations are required to reach an ε-block-k stationary point.
Figures
read the original abstract
Nonsmooth composite optimization with orthogonality constraints has a wide range of applications in statistical learning and data science. However, this problem is challenging due to its nonsmooth objective and computationally expensive nonconvex constraints. In this paper, we propose a new approach called \textbf{OBCD}, which leverages block coordinate descent to address these challenges. \textbf{OBCD} is a feasible method with a small computational footprint. In each iteration, it updates \(k\) rows of the solution matrix, where \(k \geq 2\), by globally solving a small nonsmooth optimization problem under orthogonality constraints. We prove the completeness of the proposed update scheme, showing that row-wise orthogonal updates can reach any feasible point from any feasible initialization. We further prove that the limit points generated by \textbf{OBCD}, referred to as global block-\(k\) stationary points, offer stronger optimality than standard critical points. Furthermore, we show that \textbf{OBCD} finds an \(\epsilon\)-block-\(k\) stationary point with an iteration complexity of \(\mathcal{O}(1/\epsilon)\). Additionally, under the Kurdyka--Lojasiewicz (KL) inequality, we establish the non-ergodic convergence rate of \textbf{OBCD}. We also demonstrate how novel breakpoint search methods can be used to solve the subproblems arising in \textbf{OBCD}. Empirical results show that our approach consistently outperforms existing methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes OBCD, a block coordinate descent algorithm for nonsmooth composite optimization subject to orthogonality constraints. In each iteration, k rows (k≥2) of the matrix variable are updated by globally solving a small nonsmooth subproblem under row-wise orthogonality constraints, using novel breakpoint search methods. The central claims are: (i) the row-wise orthogonal update scheme is complete, i.e., any feasible point is reachable from any feasible initialization; (ii) limit points are global block-k stationary points, a stronger notion than standard critical points; (iii) OBCD reaches an ε-block-k stationary point in O(1/ε) iterations; (iv) non-ergodic convergence rates hold under the KL inequality; and (v) the method empirically outperforms existing approaches on statistical learning tasks.
Significance. If the central claims hold, the work supplies a feasible, low-footprint method with explicit iteration complexity and a strictly stronger stationarity concept for a practically relevant class of orthogonality-constrained nonsmooth problems. The completeness result for the update scheme and the O(1/ε) bound to ε-block-k stationarity are notable strengths that would distinguish the contribution from standard BCD analyses.
major comments (2)
- [subproblem solver description (near the statement of the breakpoint search)] The section describing the breakpoint search methods for the k-row subproblems: the manuscript asserts that these methods solve each nonsmooth composite subproblem to global optimality, yet supplies no explicit argument that the enumerated breakpoints cover every candidate point or that the nonsmooth terms (ℓ1, indicators, etc.) cannot introduce additional stationary points missed by the search. Because global optimality of every block update is required for the descent property, the completeness theorem, the definition of global block-k stationarity, and the O(1/ε) complexity bound all rest on this unverified claim.
- [complexity theorem] Theorem establishing the O(1/ε) iteration complexity: the proof invokes that each subproblem is solved to global optimality to obtain a uniform descent amount; if the breakpoint procedure returns only a local solution on some instances, the descent lemma fails and the complexity guarantee does not hold.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The two major comments correctly identify that the current manuscript asserts global optimality of the breakpoint search procedures for the k-row subproblems but does not supply an explicit, self-contained argument that the enumerated breakpoints exhaust all candidate minimizers and that the nonsmooth terms cannot create additional stationary points outside the search. We will address this gap by adding the required proof in the revised version; the remainder of the analysis then follows as stated.
read point-by-point responses
-
Referee: [subproblem solver description (near the statement of the breakpoint search)] The section describing the breakpoint search methods for the k-row subproblems: the manuscript asserts that these methods solve each nonsmooth composite subproblem to global optimality, yet supplies no explicit argument that the enumerated breakpoints cover every candidate point or that the nonsmooth terms (ℓ1, indicators, etc.) cannot introduce additional stationary points missed by the search. Because global optimality of every block update is required for the descent property, the completeness theorem, the definition of global block-k stationarity, and the O(1/ε) complexity bound all rest on this unverified claim.
Authors: We agree that an explicit verification is required. In the revised manuscript we will insert a dedicated subsection (immediately following the description of the breakpoint search) that proves: (i) every point at which the subdifferential of the nonsmooth composite objective can contain zero under the row-wise orthogonality constraint must coincide with one of the enumerated breakpoints or an endpoint of the feasible interval; (ii) the nonsmooth terms (ℓ1 norms and indicator functions) contribute only finitely many additional candidate points that are already captured by the breakpoint enumeration; and (iii) exhaustive evaluation at these points therefore yields the global minimizer. This argument will be written in a self-contained manner so that the descent property, completeness of the update scheme, global block-k stationarity, and complexity bound rest on a fully justified foundation. revision: yes
-
Referee: [complexity theorem] Theorem establishing the O(1/ε) iteration complexity: the proof invokes that each subproblem is solved to global optimality to obtain a uniform descent amount; if the breakpoint procedure returns only a local solution on some instances, the descent lemma fails and the complexity guarantee does not hold.
Authors: The complexity proof indeed relies on a uniform descent amount that is guaranteed only when each block subproblem is solved to global optimality. Once the explicit argument requested in the first comment is added, the descent lemma holds without qualification and the O(1/ε) bound follows exactly as written. No change to the statement or proof structure of the complexity theorem itself will be needed beyond referencing the new subsection on global optimality of the subproblem solver. revision: yes
Circularity Check
No significant circularity; theoretical results are independently derived
full rationale
The paper presents explicit proofs for the completeness of the row-wise orthogonal update scheme (reaching any feasible point), the stronger optimality of global block-k stationary points, and the O(1/ε) iteration complexity to ε-block-k stationarity. These are standard convergence arguments in block coordinate descent literature and do not reduce to fitted parameters, self-definitions, or self-citation chains. The breakpoint search method for subproblems is introduced as a separate algorithmic contribution rather than an unverified assumption that collapses the main claims. No load-bearing step matches any enumerated circularity pattern.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Kurdyka--Lojasiewicz (KL) inequality
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We also demonstrate how novel breakpoint search methods can be used to solve the subproblems arising in OBCD... Any orthogonal matrix V ∈ St(2,2) can be expressed as V = Vrot_θ or V = Vref_θ ... min_θ ½∥V∥²_Q + ⟨V,P⟩ + h(VZ) s.t. V ∈ {Vrot_θ, Vref_θ}
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We prove the completeness of the proposed update scheme, showing that row-wise orthogonal updates can reach any feasible point from any feasible initialization... limit points ... global block-k stationary points
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems (NeurIPS), 26, 2013
Fantope projection and selection: A near-optimal convex relaxation of sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 26, 2013
work page 2013
-
[2]
E., Eriksson, J., and Koivunen, V
Abrudan, T. E., Eriksson, J., and Koivunen, V. Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Transactions on Signal Processing, 56 0 (3): 0 1134--1147, 2008
work page 2008
-
[3]
Absil, P., Mahony, R. E., and Sepulchre, R. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008
work page 2008
-
[4]
Attouch, H., Bolte, J., Redont, P., and Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-lojasiewicz inequality. Mathematics of Operations Research, 35 0 (2): 0 438--457, 2010
work page 2010
-
[5]
Deep layers as stochastic solvers
Bibi, A., Ghanem, B., Koltun, V., and Ranftl, R. Deep layers as stochastic solvers. In International Conference on Learning Representations (ICLR). OpenReview.net, 2019
work page 2019
-
[6]
Proximal alternating linearized minimization for nonconvex and nonsmooth problems
Bolte, J., Sabach, S., and Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146 0 (1-2): 0 459--494, 2014
work page 2014
-
[7]
Boumal, N. and Absil, P.-a. Rtrmc: A riemannian trust-region method for low-rank matrix completion. Advances in Neural Information Processing Systems (NeurIPS), 24, 2011
work page 2011
-
[8]
Proximal gradient method for nonsmooth optimization over the stiefel manifold
Chen, S., Ma, S., Man-Cho So, A., and Zhang, T. Proximal gradient method for nonsmooth optimization over the stiefel manifold. SIAM Journal on Optimization, 30 0 (1): 0 210--239, 2020
work page 2020
-
[9]
Chen, W., Ji, H., and You, Y. An augmented lagrangian method for _1 -regularized optimization problems with orthogonality constraints. SIAM Journal on Scientific Computing, 38 0 (4): 0 B570--B592, 2016
work page 2016
-
[10]
Optimal solutions for sparse principal component analysis
d'Aspremont, A., Bach, F., and El Ghaoui, L. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9 0 (7), 2008
work page 2008
-
[11]
Edelman, A., Arias, T. A., and Smith, S. T. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20 0 (2): 0 303--353, 1998
work page 1998
-
[12]
Frerix, T. and Bruna, J. Approximating orthogonal matrices with effective givens factorization. In International Conference on Machine Learning (ICML), pp.\ 1993--2001, 2019
work page 1993
-
[13]
A new first-order algorithmic framework for optimization problems with orthogonality constraints
Gao, B., Liu, X., Chen, X., and Yuan, Y.-x. A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM Journal on Optimization, 28 0 (1): 0 302--332, 2018
work page 2018
-
[14]
Parallelizable algorithms for optimization problems with orthogonality constraints
Gao, B., Liu, X., and Yuan, Y.-x. Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM Journal on Scientific Computing, 41 0 (3): 0 A1949--A1983, 2019
work page 2019
-
[15]
Golub, G. H. and Van Loan, C. F. Matrix computations. 2013
work page 2013
-
[16]
He, B. and Yuan, X. On the O (1/n) convergence rate of the douglascrachford alternating direction method. SIAM Journal on Numerical Analysis, 50 0 (2): 0 700--709, 2012
work page 2012
-
[17]
Adaptive quadratically regularized newton method for riemannian optimization
Hu, J., Milzarek, A., Wen, Z., and Yuan, Y. Adaptive quadratically regularized newton method for riemannian optimization. SIAM Journal on Matrix Analysis and Applications, 39 0 (3): 0 1181--1207, 2018
work page 2018
-
[18]
Structured quasi-newton methods for optimization with orthogonality constraints
Hu, J., Jiang, B., Lin, L., Wen, Z., and Yuan, Y.-x. Structured quasi-newton methods for optimization with orthogonality constraints. SIAM Journal on Scientific Computing, 41 0 (4): 0 A2239--A2269, 2019
work page 2019
-
[19]
A brief introduction to manifold optimization
Hu, J., Liu, X., Wen, Z.-W., and Yuan, Y.-X. A brief introduction to manifold optimization. Journal of the Operations Research Society of China, 8 0 (2): 0 199--248, 2020
work page 2020
-
[20]
Hwang, S. J., Collins, M. D., Ravi, S. N., Ithapu, V. K., Adluru, N., Johnson, S. C., and Singh, V. A projection free method for generalized eigenvalue problem with a nonsmooth regularizer. In IEEE International Conference on Computer Vision (ICCV), pp.\ 1841--1849, 2015
work page 2015
-
[21]
Jiang, B. and Dai, Y.-H. A framework of constraint preserving update schemes for optimization on stiefel manifold. Mathematical Programming, 153 0 (2): 0 535--575, 2015
work page 2015
-
[22]
_p -norm regularization algorithms for optimization over permutation matrices
Jiang, B., Liu, Y.-F., and Wen, Z. _p -norm regularization algorithms for optimization over permutation matrices. SIAM Journal on Optimization, 26 0 (4): 0 2284--2313, 2016
work page 2016
-
[23]
An exact penalty approach for optimization with nonnegative orthogonality constraints
Jiang, B., Meng, X., Wen, Z., and Chen, X. An exact penalty approach for optimization with nonnegative orthogonality constraints. Mathematical Programming, pp.\ 1--43, 2022
work page 2022
-
[24]
Generalized power method for sparse principal component analysis
Journ \'e e, M., Nesterov, Y., Richt \'a rik, P., and Sepulchre, R. Generalized power method for sparse principal component analysis. Journal of Machine Learning Research, 11 0 (2), 2010
work page 2010
-
[25]
Lai, R. and Osher, S. A splitting method for orthogonality constrained problems. Journal of Scientific Computing, 58 0 (2): 0 431--449, 2014
work page 2014
-
[26]
Weakly convex optimization over stiefel manifold using riemannian subgradient-type methods
Li, X., Chen, S., Deng, Z., Qu, Q., Zhu, Z., and Man-Cho So, A. Weakly convex optimization over stiefel manifold using riemannian subgradient-type methods. SIAM Journal on Optimization, 31 0 (3): 0 1605--1634, 2021
work page 2021
-
[27]
Liu, H., Wu, W., and So, A. M.-C. Quadratic optimization with orthogonality constraints: Explicit lojasiewicz exponent and linear convergence of line-search methods. In International Conference on Machine Learning (ICML), pp.\ 1158--1167, 2016
work page 2016
-
[28]
Liu, H., Yue, M.-C., and Man-Cho So, A. On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM Journal on Optimization, 27 0 (4): 0 2426--2446, 2017
work page 2017
-
[29]
Liu, W., Zhang, Y., Yang, H., and Zhang, S. A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optimization, 69 0 (3): 0 399--426, 2020
work page 2020
-
[30]
On the convergence of the self-consistent field iteration in kohn--sham density functional theory
Liu, X., Wang, X., Wen, Z., and Yuan, Y. On the convergence of the self-consistent field iteration in kohn--sham density functional theory. SIAM Journal on Matrix Analysis and Applications, 35 0 (2): 0 546--558, 2014
work page 2014
-
[31]
An efficient gauss--newton algorithm for symmetric low-rank product matrix approximations
Liu, X., Wen, Z., and Zhang, Y. An efficient gauss--newton algorithm for symmetric low-rank product matrix approximations. SIAM Journal on Optimization, 25 0 (3): 0 1571--1608, 2015
work page 2015
-
[32]
Lu, Z. and Zhang, Y. An augmented lagrangian approach for sparse principal component analysis. Mathematical Programming, 135 0 (1-2): 0 149--193, 2012
work page 2012
-
[33]
Optimization with first-order surrogate functions
Mairal, J. Optimization with first-order surrogate functions. In International Conference on Machine Learning (ICML), volume 28, pp.\ 783--791, 2013
work page 2013
-
[34]
Massart, E. and Abrol, V. Coordinate descent on the orthogonal group for recurrent neural network training. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 36, pp.\ 7744--7751, 2022
work page 2022
-
[35]
Mordukhovich, B. S. Variational analysis and generalized differentiation i: Basic theory. Berlin Springer, 330, 2006
work page 2006
-
[36]
Coordinate descent method for k-means
Nie, F., Xue, J., Wu, D., Wang, R., Li, H., and Li, X. Coordinate descent method for k-means. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 0 (5): 0 2371--2385, 2022
work page 2022
-
[37]
Exact penalty methods for minimizing a smooth function over the nonnegative orthogonal set
Qian, Y., Pan, S., and Xiao, L. Exact penalty methods for minimizing a smooth function over the nonnegative orthogonal set. arXiv, 11 2021
work page 2021
-
[38]
A unified convergence analysis of block successive minimization methods for nonsmooth optimization
Razaviyayn, M., Hong, M., and Luo, Z. A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization, 23 0 (2): 0 1126--1153, 2013
work page 2013
-
[39]
Rockafellar, R. T. and Wets., R. J.-B. Variational analysis. Springer Science & Business Media, 317, 2009
work page 2009
-
[40]
Shalit, U. and Chechik, G. Coordinate-descent for learning orthogonal matrices through givens rotations. In International Conference on Machine Learning (ICML), pp.\ 548--556. PMLR, 2014
work page 2014
-
[41]
Sun, X. and Bischof, C. A basis-kernel representation of orthogonal matrices. SIAM Journal on Matrix Analysis and Applications, 16 0 (4): 0 1184--1196, 1995
work page 1995
-
[42]
Tseng, P. and Yun, S. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117 0 (1-2): 0 387--423, 2009
work page 2009
-
[43]
Q., Cho, J., Lei, J., and Rohe, K
Vu, V. Q., Cho, J., Lei, J., and Rohe, K. Fantope projection and selection: A near-optimal convex relaxation of sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 26, 2013
work page 2013
-
[44]
Wen, Z. and Yin, W. A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142 0 (1-2): 0 397, 2013
work page 2013
-
[45]
Trace-penalty minimization for large-scale eigenspace computation
Wen, Z., Yang, C., Liu, X., and Zhang, Y. Trace-penalty minimization for large-scale eigenspace computation. Journal of Scientific Computing, 66 0 (3): 0 1175--1203, 2016
work page 2016
-
[46]
Quartic equation: https://en.wikipedia.org/wiki/Quartic_equation
WikiContributors. Quartic equation: https://en.wikipedia.org/wiki/Quartic_equation. pp.\ Last edited in March, 2023
work page 2023
-
[47]
Xu, Y. and Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6 0 (3): 0 1758--1789, 2013
work page 2013
-
[48]
Coordinate descent methods for fractional minimization
Yuan, G. Coordinate descent methods for fractional minimization. arXiv, 2022
work page 2022
-
[49]
Coordinate descent methods for dc minimization: Optimality conditions and global convergence
Yuan, G. Coordinate descent methods for dc minimization: Optimality conditions and global convergence. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023
work page 2023
-
[50]
A block decomposition algorithm for sparse optimization
Yuan, G., Shen, L., and Zheng, W.-S. A block decomposition algorithm for sparse optimization. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), 2020
work page 2020
-
[51]
Zass, R. and Shashua, A. Nonnegative sparse PCA . Advances in Neural Information Processing Systems (NeurIPS), pp.\ 1561--1568, 2006 a
work page 2006
-
[52]
Zass, R. and Shashua, A. Nonnegative sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 19, 2006 b
work page 2006
-
[53]
Zeng, J., Lau, T. T.-K., Lin, S., and Yao, Y. Global convergence of block coordinate descent in deep learning. In International Conference on Machine Learning (ICML), pp.\ 7313--7323. PMLR, 2019
work page 2019
-
[54]
Complete dictionary learning via l4-norm maximization over the orthogonal group
Zhai, Y., Yang, Z., Liao, Z., Wright, J., and Ma, Y. Complete dictionary learning via l4-norm maximization over the orthogonal group. Journal of Machine Learning Research, 21 0 (165): 0 1--68, 2020
work page 2020
-
[55]
Primal-dual optimization algorithms over riemannian manifolds: an iteration complexity analysis
Zhang, J., Ma, S., and Zhang, S. Primal-dual optimization algorithms over riemannian manifolds: an iteration complexity analysis. Mathematical Programming, pp.\ 1--46, 2019
work page 2019
-
[56]
Gradient type optimization methods for electronic structure calculations
Zhang, X., Zhu, J., Wen, Z., and Zhou, A. Gradient type optimization methods for electronic structure calculations. SIAM Journal on Scientific Computing, 36 0 (3): 0 265--289, 2014
work page 2014
-
[57]
A riemannian newton algorithm for nonlinear eigenvalue problems
Zhao, Z., Bai, Z.-J., and Jin, X.-Q. A riemannian newton algorithm for nonlinear eigenvalue problems. SIAM Journal on Matrix Analysis and Applications, 36 0 (2): 0 752--774, 2015
work page 2015
-
[58]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.