A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints

Ganzhao Yuan

arxiv: 2304.03641 · v4 · pith:CNRWB77Inew · submitted 2023-04-07 · 🧮 math.OC · cs.LG· cs.NA· math.NA

A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints

Ganzhao Yuan This is my paper

Pith reviewed 2026-05-24 09:20 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAmath.NA

keywords block coordinate descentorthogonality constraintsnonsmooth composite optimizationglobal block-k stationary pointsiteration complexityKurdyka-Lojasiewicz inequality

0 comments

The pith

OBCD updates k rows at a time by exactly solving small nonsmooth subproblems to reach global block-k stationary points with O(1/ε) iteration complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OBCD, a block coordinate descent method for nonsmooth composite optimization under orthogonality constraints. In each step it selects k rows of the current matrix and replaces them by the global minimizer of a small subproblem that keeps the row block orthogonal. The authors prove that such row-wise updates are complete, meaning any feasible point is reachable from any other, and that the limit points satisfy a global block-k stationarity condition stronger than ordinary criticality. They further establish an iteration complexity of O(1/ε) to produce an ε-block-k stationary point and a non-ergodic rate under the Kurdyka–Lojasiewicz inequality.

Core claim

The central claim is that the OBCD scheme, by globally solving the nonsmooth subproblems that arise when k rows are updated under orthogonality constraints, produces limit points that are global block-k stationary points and reaches an ε-block-k stationary point after O(1/ε) iterations.

What carries the argument

The OBCD row-wise update that globally solves a small nonsmooth optimization problem under orthogonality constraints for any chosen block of k rows.

If this is right

Row-wise orthogonal updates can reach any feasible point from any feasible initialization.
Limit points of OBCD are global block-k stationary points that satisfy stronger optimality than standard critical points.
An ε-block-k stationary point is found after O(1/ε) iterations.
Under the Kurdyka–Lojasiewicz inequality a non-ergodic convergence rate holds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If similar globally solvable subproblems exist for other matrix constraints, the same block-update completeness argument could apply.
The stronger stationarity notion may translate into better empirical performance on downstream statistical-learning tasks that rely on orthogonal matrices.
Breakpoint search techniques developed for the subproblems could be reused in other nonconvex composite problems that admit low-dimensional orthogonal blocks.

Load-bearing premise

The small nonsmooth optimization subproblems under orthogonality constraints that arise when updating k rows can be solved to global optimality using breakpoint search methods.

What would settle it

A concrete numerical instance in which breakpoint search returns a suboptimal subproblem solution and the overall algorithm fails to produce a global block-k stationary point, or an experiment showing that more than linear-in-1/ε iterations are required to reach an ε-block-k stationary point.

Figures

Figures reproduced from arXiv: 2304.03641 by Ganzhao Yuan.

**Figure 2.** Figure 2: Geometric Visualizations of Two Examples of [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: The convergence curve of the compared methods for s [PITH_FULL_IMAGE:figures/full_fig_p042_3.png] view at source ↗

**Figure 4.** Figure 4: The convergence curve of the compared methods for s [PITH_FULL_IMAGE:figures/full_fig_p047_4.png] view at source ↗

read the original abstract

Nonsmooth composite optimization with orthogonality constraints has a wide range of applications in statistical learning and data science. However, this problem is challenging due to its nonsmooth objective and computationally expensive nonconvex constraints. In this paper, we propose a new approach called \textbf{OBCD}, which leverages block coordinate descent to address these challenges. \textbf{OBCD} is a feasible method with a small computational footprint. In each iteration, it updates \(k\) rows of the solution matrix, where \(k \geq 2\), by globally solving a small nonsmooth optimization problem under orthogonality constraints. We prove the completeness of the proposed update scheme, showing that row-wise orthogonal updates can reach any feasible point from any feasible initialization. We further prove that the limit points generated by \textbf{OBCD}, referred to as global block-\(k\) stationary points, offer stronger optimality than standard critical points. Furthermore, we show that \textbf{OBCD} finds an \(\epsilon\)-block-\(k\) stationary point with an iteration complexity of \(\mathcal{O}(1/\epsilon)\). Additionally, under the Kurdyka--Lojasiewicz (KL) inequality, we establish the non-ergodic convergence rate of \textbf{OBCD}. We also demonstrate how novel breakpoint search methods can be used to solve the subproblems arising in \textbf{OBCD}. Empirical results show that our approach consistently outperforms existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OBCD adds a completeness result for k-row orthogonal updates and a stronger global block-k stationarity notion, plus O(1/ε) complexity, but everything rests on the breakpoint search actually solving the nonsmooth subproblems to global optimality.

read the letter

The paper's main contribution is the OBCD algorithm, which does block coordinate descent by updating exactly k rows at a time while staying feasible. It proves that these updates are complete, meaning you can reach any feasible matrix from any feasible starting point. It also defines global block-k stationary points as a stronger notion than ordinary critical points and shows an O(1/ε) iteration bound to reach an ε-block-k stationary point, plus non-ergodic linear rate under the KL property. The method keeps a small per-iteration cost and the abstract reports better empirical performance than prior approaches on the tested problems. That package of results is new relative to the BCD literature cited in the abstract. The work is clearly aimed at people who need feasible methods for orthogonality-constrained nonsmooth problems in statistical learning. The formal claims are stated directly and the citation pattern looks standard. The soft spot is the subproblem solver. The entire descent property, stationarity guarantee, and complexity bound require that each k-row nonsmooth subproblem is solved to global optimality. The paper says novel breakpoint search methods handle this, but if the enumeration misses candidate points created by the nonsmooth terms, the analysis does not go through. That assumption is load-bearing and needs explicit verification in the proofs. Minor issues like implementation details for the search would be easy to fix in revision. Overall the paper shows clear thinking on the algorithmic structure and the stationarity definitions. It is worth sending to peer review so referees can check the breakpoint completeness argument and the empirical setup in detail.

Referee Report

2 major / 0 minor

Summary. The paper proposes OBCD, a block coordinate descent algorithm for nonsmooth composite optimization subject to orthogonality constraints. In each iteration, k rows (k≥2) of the matrix variable are updated by globally solving a small nonsmooth subproblem under row-wise orthogonality constraints, using novel breakpoint search methods. The central claims are: (i) the row-wise orthogonal update scheme is complete, i.e., any feasible point is reachable from any feasible initialization; (ii) limit points are global block-k stationary points, a stronger notion than standard critical points; (iii) OBCD reaches an ε-block-k stationary point in O(1/ε) iterations; (iv) non-ergodic convergence rates hold under the KL inequality; and (v) the method empirically outperforms existing approaches on statistical learning tasks.

Significance. If the central claims hold, the work supplies a feasible, low-footprint method with explicit iteration complexity and a strictly stronger stationarity concept for a practically relevant class of orthogonality-constrained nonsmooth problems. The completeness result for the update scheme and the O(1/ε) bound to ε-block-k stationarity are notable strengths that would distinguish the contribution from standard BCD analyses.

major comments (2)

[subproblem solver description (near the statement of the breakpoint search)] The section describing the breakpoint search methods for the k-row subproblems: the manuscript asserts that these methods solve each nonsmooth composite subproblem to global optimality, yet supplies no explicit argument that the enumerated breakpoints cover every candidate point or that the nonsmooth terms (ℓ1, indicators, etc.) cannot introduce additional stationary points missed by the search. Because global optimality of every block update is required for the descent property, the completeness theorem, the definition of global block-k stationarity, and the O(1/ε) complexity bound all rest on this unverified claim.
[complexity theorem] Theorem establishing the O(1/ε) iteration complexity: the proof invokes that each subproblem is solved to global optimality to obtain a uniform descent amount; if the breakpoint procedure returns only a local solution on some instances, the descent lemma fails and the complexity guarantee does not hold.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments correctly identify that the current manuscript asserts global optimality of the breakpoint search procedures for the k-row subproblems but does not supply an explicit, self-contained argument that the enumerated breakpoints exhaust all candidate minimizers and that the nonsmooth terms cannot create additional stationary points outside the search. We will address this gap by adding the required proof in the revised version; the remainder of the analysis then follows as stated.

read point-by-point responses

Referee: [subproblem solver description (near the statement of the breakpoint search)] The section describing the breakpoint search methods for the k-row subproblems: the manuscript asserts that these methods solve each nonsmooth composite subproblem to global optimality, yet supplies no explicit argument that the enumerated breakpoints cover every candidate point or that the nonsmooth terms (ℓ1, indicators, etc.) cannot introduce additional stationary points missed by the search. Because global optimality of every block update is required for the descent property, the completeness theorem, the definition of global block-k stationarity, and the O(1/ε) complexity bound all rest on this unverified claim.

Authors: We agree that an explicit verification is required. In the revised manuscript we will insert a dedicated subsection (immediately following the description of the breakpoint search) that proves: (i) every point at which the subdifferential of the nonsmooth composite objective can contain zero under the row-wise orthogonality constraint must coincide with one of the enumerated breakpoints or an endpoint of the feasible interval; (ii) the nonsmooth terms (ℓ1 norms and indicator functions) contribute only finitely many additional candidate points that are already captured by the breakpoint enumeration; and (iii) exhaustive evaluation at these points therefore yields the global minimizer. This argument will be written in a self-contained manner so that the descent property, completeness of the update scheme, global block-k stationarity, and complexity bound rest on a fully justified foundation. revision: yes
Referee: [complexity theorem] Theorem establishing the O(1/ε) iteration complexity: the proof invokes that each subproblem is solved to global optimality to obtain a uniform descent amount; if the breakpoint procedure returns only a local solution on some instances, the descent lemma fails and the complexity guarantee does not hold.

Authors: The complexity proof indeed relies on a uniform descent amount that is guaranteed only when each block subproblem is solved to global optimality. Once the explicit argument requested in the first comment is added, the descent lemma holds without qualification and the O(1/ε) bound follows exactly as written. No change to the statement or proof structure of the complexity theorem itself will be needed beyond referencing the new subsection on global optimality of the subproblem solver. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical results are independently derived

full rationale

The paper presents explicit proofs for the completeness of the row-wise orthogonal update scheme (reaching any feasible point), the stronger optimality of global block-k stationary points, and the O(1/ε) iteration complexity to ε-block-k stationarity. These are standard convergence arguments in block coordinate descent literature and do not reduce to fitted parameters, self-definitions, or self-citation chains. The breakpoint search method for subproblems is introduced as a separate algorithmic contribution rather than an unverified assumption that collapses the main claims. No load-bearing step matches any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the global solvability of the k-row subproblems and the applicability of the KL inequality for rates; no free parameters or invented entities are indicated in the abstract.

axioms (1)

domain assumption Kurdyka--Lojasiewicz (KL) inequality
Invoked to establish the non-ergodic convergence rate of OBCD.

pith-pipeline@v0.9.0 · 5788 in / 1364 out tokens · 42791 ms · 2026-05-24T09:20:26.198368+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We also demonstrate how novel breakpoint search methods can be used to solve the subproblems arising in OBCD... Any orthogonal matrix V ∈ St(2,2) can be expressed as V = Vrot_θ or V = Vref_θ ... min_θ ½∥V∥²_Q + ⟨V,P⟩ + h(VZ) s.t. V ∈ {Vrot_θ, Vref_θ}
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove the completeness of the proposed update scheme, showing that row-wise orthogonal updates can reach any feasible point from any feasible initialization... limit points ... global block-k stationary points

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

[1]

Advances in Neural Information Processing Systems (NeurIPS), 26, 2013

Fantope projection and selection: A near-optimal convex relaxation of sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 26, 2013

work page 2013
[2]

E., Eriksson, J., and Koivunen, V

Abrudan, T. E., Eriksson, J., and Koivunen, V. Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Transactions on Signal Processing, 56 0 (3): 0 1134--1147, 2008

work page 2008
[3]

E., and Sepulchre, R

Absil, P., Mahony, R. E., and Sepulchre, R. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

work page 2008
[4]

Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-lojasiewicz inequality

Attouch, H., Bolte, J., Redont, P., and Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-lojasiewicz inequality. Mathematics of Operations Research, 35 0 (2): 0 438--457, 2010

work page 2010
[5]

Deep layers as stochastic solvers

Bibi, A., Ghanem, B., Koltun, V., and Ranftl, R. Deep layers as stochastic solvers. In International Conference on Learning Representations (ICLR). OpenReview.net, 2019

work page 2019
[6]

Proximal alternating linearized minimization for nonconvex and nonsmooth problems

Bolte, J., Sabach, S., and Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146 0 (1-2): 0 459--494, 2014

work page 2014
[7]

and Absil, P.-a

Boumal, N. and Absil, P.-a. Rtrmc: A riemannian trust-region method for low-rank matrix completion. Advances in Neural Information Processing Systems (NeurIPS), 24, 2011

work page 2011
[8]

Proximal gradient method for nonsmooth optimization over the stiefel manifold

Chen, S., Ma, S., Man-Cho So, A., and Zhang, T. Proximal gradient method for nonsmooth optimization over the stiefel manifold. SIAM Journal on Optimization, 30 0 (1): 0 210--239, 2020

work page 2020
[9]

An augmented lagrangian method for _1 -regularized optimization problems with orthogonality constraints

Chen, W., Ji, H., and You, Y. An augmented lagrangian method for _1 -regularized optimization problems with orthogonality constraints. SIAM Journal on Scientific Computing, 38 0 (4): 0 B570--B592, 2016

work page 2016
[10]

Optimal solutions for sparse principal component analysis

d'Aspremont, A., Bach, F., and El Ghaoui, L. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9 0 (7), 2008

work page 2008
[11]

A., and Smith, S

Edelman, A., Arias, T. A., and Smith, S. T. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20 0 (2): 0 303--353, 1998

work page 1998
[12]

and Bruna, J

Frerix, T. and Bruna, J. Approximating orthogonal matrices with effective givens factorization. In International Conference on Machine Learning (ICML), pp.\ 1993--2001, 2019

work page 1993
[13]

A new first-order algorithmic framework for optimization problems with orthogonality constraints

Gao, B., Liu, X., Chen, X., and Yuan, Y.-x. A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM Journal on Optimization, 28 0 (1): 0 302--332, 2018

work page 2018
[14]

Parallelizable algorithms for optimization problems with orthogonality constraints

Gao, B., Liu, X., and Yuan, Y.-x. Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM Journal on Scientific Computing, 41 0 (3): 0 A1949--A1983, 2019

work page 2019
[15]

Golub, G. H. and Van Loan, C. F. Matrix computations. 2013

work page 2013
[16]

and Yuan, X

He, B. and Yuan, X. On the O (1/n) convergence rate of the douglascrachford alternating direction method. SIAM Journal on Numerical Analysis, 50 0 (2): 0 700--709, 2012

work page 2012
[17]

Adaptive quadratically regularized newton method for riemannian optimization

Hu, J., Milzarek, A., Wen, Z., and Yuan, Y. Adaptive quadratically regularized newton method for riemannian optimization. SIAM Journal on Matrix Analysis and Applications, 39 0 (3): 0 1181--1207, 2018

work page 2018
[18]

Structured quasi-newton methods for optimization with orthogonality constraints

Hu, J., Jiang, B., Lin, L., Wen, Z., and Yuan, Y.-x. Structured quasi-newton methods for optimization with orthogonality constraints. SIAM Journal on Scientific Computing, 41 0 (4): 0 A2239--A2269, 2019

work page 2019
[19]

A brief introduction to manifold optimization

Hu, J., Liu, X., Wen, Z.-W., and Yuan, Y.-X. A brief introduction to manifold optimization. Journal of the Operations Research Society of China, 8 0 (2): 0 199--248, 2020

work page 2020
[20]

J., Collins, M

Hwang, S. J., Collins, M. D., Ravi, S. N., Ithapu, V. K., Adluru, N., Johnson, S. C., and Singh, V. A projection free method for generalized eigenvalue problem with a nonsmooth regularizer. In IEEE International Conference on Computer Vision (ICCV), pp.\ 1841--1849, 2015

work page 2015
[21]

and Dai, Y.-H

Jiang, B. and Dai, Y.-H. A framework of constraint preserving update schemes for optimization on stiefel manifold. Mathematical Programming, 153 0 (2): 0 535--575, 2015

work page 2015
[22]

_p -norm regularization algorithms for optimization over permutation matrices

Jiang, B., Liu, Y.-F., and Wen, Z. _p -norm regularization algorithms for optimization over permutation matrices. SIAM Journal on Optimization, 26 0 (4): 0 2284--2313, 2016

work page 2016
[23]

An exact penalty approach for optimization with nonnegative orthogonality constraints

Jiang, B., Meng, X., Wen, Z., and Chen, X. An exact penalty approach for optimization with nonnegative orthogonality constraints. Mathematical Programming, pp.\ 1--43, 2022

work page 2022
[24]

Generalized power method for sparse principal component analysis

Journ \'e e, M., Nesterov, Y., Richt \'a rik, P., and Sepulchre, R. Generalized power method for sparse principal component analysis. Journal of Machine Learning Research, 11 0 (2), 2010

work page 2010
[25]

and Osher, S

Lai, R. and Osher, S. A splitting method for orthogonality constrained problems. Journal of Scientific Computing, 58 0 (2): 0 431--449, 2014

work page 2014
[26]

Weakly convex optimization over stiefel manifold using riemannian subgradient-type methods

Li, X., Chen, S., Deng, Z., Qu, Q., Zhu, Z., and Man-Cho So, A. Weakly convex optimization over stiefel manifold using riemannian subgradient-type methods. SIAM Journal on Optimization, 31 0 (3): 0 1605--1634, 2021

work page 2021
[27]

Liu, H., Wu, W., and So, A. M.-C. Quadratic optimization with orthogonality constraints: Explicit lojasiewicz exponent and linear convergence of line-search methods. In International Conference on Machine Learning (ICML), pp.\ 1158--1167, 2016

work page 2016
[28]

On the estimation performance and convergence rate of the generalized power method for phase synchronization

Liu, H., Yue, M.-C., and Man-Cho So, A. On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM Journal on Optimization, 27 0 (4): 0 2426--2446, 2017

work page 2017
[29]

A class of smooth exact penalty function methods for optimization problems with orthogonality constraints

Liu, W., Zhang, Y., Yang, H., and Zhang, S. A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optimization, 69 0 (3): 0 399--426, 2020

work page 2020
[30]

On the convergence of the self-consistent field iteration in kohn--sham density functional theory

Liu, X., Wang, X., Wen, Z., and Yuan, Y. On the convergence of the self-consistent field iteration in kohn--sham density functional theory. SIAM Journal on Matrix Analysis and Applications, 35 0 (2): 0 546--558, 2014

work page 2014
[31]

An efficient gauss--newton algorithm for symmetric low-rank product matrix approximations

Liu, X., Wen, Z., and Zhang, Y. An efficient gauss--newton algorithm for symmetric low-rank product matrix approximations. SIAM Journal on Optimization, 25 0 (3): 0 1571--1608, 2015

work page 2015
[32]

and Zhang, Y

Lu, Z. and Zhang, Y. An augmented lagrangian approach for sparse principal component analysis. Mathematical Programming, 135 0 (1-2): 0 149--193, 2012

work page 2012
[33]

Optimization with first-order surrogate functions

Mairal, J. Optimization with first-order surrogate functions. In International Conference on Machine Learning (ICML), volume 28, pp.\ 783--791, 2013

work page 2013
[34]

and Abrol, V

Massart, E. and Abrol, V. Coordinate descent on the orthogonal group for recurrent neural network training. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 36, pp.\ 7744--7751, 2022

work page 2022
[35]

Mordukhovich, B. S. Variational analysis and generalized differentiation i: Basic theory. Berlin Springer, 330, 2006

work page 2006
[36]

Coordinate descent method for k-means

Nie, F., Xue, J., Wu, D., Wang, R., Li, H., and Li, X. Coordinate descent method for k-means. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 0 (5): 0 2371--2385, 2022

work page 2022
[37]

Exact penalty methods for minimizing a smooth function over the nonnegative orthogonal set

Qian, Y., Pan, S., and Xiao, L. Exact penalty methods for minimizing a smooth function over the nonnegative orthogonal set. arXiv, 11 2021

work page 2021
[38]

A unified convergence analysis of block successive minimization methods for nonsmooth optimization

Razaviyayn, M., Hong, M., and Luo, Z. A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization, 23 0 (2): 0 1126--1153, 2013

work page 2013
[39]

Rockafellar, R. T. and Wets., R. J.-B. Variational analysis. Springer Science & Business Media, 317, 2009

work page 2009
[40]

and Chechik, G

Shalit, U. and Chechik, G. Coordinate-descent for learning orthogonal matrices through givens rotations. In International Conference on Machine Learning (ICML), pp.\ 548--556. PMLR, 2014

work page 2014
[41]

and Bischof, C

Sun, X. and Bischof, C. A basis-kernel representation of orthogonal matrices. SIAM Journal on Matrix Analysis and Applications, 16 0 (4): 0 1184--1196, 1995

work page 1995
[42]

and Yun, S

Tseng, P. and Yun, S. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117 0 (1-2): 0 387--423, 2009

work page 2009
[43]

Q., Cho, J., Lei, J., and Rohe, K

Vu, V. Q., Cho, J., Lei, J., and Rohe, K. Fantope projection and selection: A near-optimal convex relaxation of sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 26, 2013

work page 2013
[44]

and Yin, W

Wen, Z. and Yin, W. A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142 0 (1-2): 0 397, 2013

work page 2013
[45]

Trace-penalty minimization for large-scale eigenspace computation

Wen, Z., Yang, C., Liu, X., and Zhang, Y. Trace-penalty minimization for large-scale eigenspace computation. Journal of Scientific Computing, 66 0 (3): 0 1175--1203, 2016

work page 2016
[46]

Quartic equation: https://en.wikipedia.org/wiki/Quartic_equation

WikiContributors. Quartic equation: https://en.wikipedia.org/wiki/Quartic_equation. pp.\ Last edited in March, 2023

work page 2023
[47]

and Yin, W

Xu, Y. and Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6 0 (3): 0 1758--1789, 2013

work page 2013
[48]

Coordinate descent methods for fractional minimization

Yuan, G. Coordinate descent methods for fractional minimization. arXiv, 2022

work page 2022
[49]

Coordinate descent methods for dc minimization: Optimality conditions and global convergence

Yuan, G. Coordinate descent methods for dc minimization: Optimality conditions and global convergence. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023

work page 2023
[50]

A block decomposition algorithm for sparse optimization

Yuan, G., Shen, L., and Zheng, W.-S. A block decomposition algorithm for sparse optimization. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), 2020

work page 2020
[51]

and Shashua, A

Zass, R. and Shashua, A. Nonnegative sparse PCA . Advances in Neural Information Processing Systems (NeurIPS), pp.\ 1561--1568, 2006 a

work page 2006
[52]

and Shashua, A

Zass, R. and Shashua, A. Nonnegative sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 19, 2006 b

work page 2006
[53]

T.-K., Lin, S., and Yao, Y

Zeng, J., Lau, T. T.-K., Lin, S., and Yao, Y. Global convergence of block coordinate descent in deep learning. In International Conference on Machine Learning (ICML), pp.\ 7313--7323. PMLR, 2019

work page 2019
[54]

Complete dictionary learning via l4-norm maximization over the orthogonal group

Zhai, Y., Yang, Z., Liao, Z., Wright, J., and Ma, Y. Complete dictionary learning via l4-norm maximization over the orthogonal group. Journal of Machine Learning Research, 21 0 (165): 0 1--68, 2020

work page 2020
[55]

Primal-dual optimization algorithms over riemannian manifolds: an iteration complexity analysis

Zhang, J., Ma, S., and Zhang, S. Primal-dual optimization algorithms over riemannian manifolds: an iteration complexity analysis. Mathematical Programming, pp.\ 1--46, 2019

work page 2019
[56]

Gradient type optimization methods for electronic structure calculations

Zhang, X., Zhu, J., Wen, Z., and Zhou, A. Gradient type optimization methods for electronic structure calculations. SIAM Journal on Scientific Computing, 36 0 (3): 0 265--289, 2014

work page 2014
[57]

A riemannian newton algorithm for nonlinear eigenvalue problems

Zhao, Z., Bai, Z.-J., and Jin, X.-Q. A riemannian newton algorithm for nonlinear eigenvalue problems. SIAM Journal on Matrix Analysis and Applications, 36 0 (2): 0 752--774, 2015

work page 2015
[58]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[1] [1]

Advances in Neural Information Processing Systems (NeurIPS), 26, 2013

Fantope projection and selection: A near-optimal convex relaxation of sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 26, 2013

work page 2013

[2] [2]

E., Eriksson, J., and Koivunen, V

Abrudan, T. E., Eriksson, J., and Koivunen, V. Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Transactions on Signal Processing, 56 0 (3): 0 1134--1147, 2008

work page 2008

[3] [3]

E., and Sepulchre, R

Absil, P., Mahony, R. E., and Sepulchre, R. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

work page 2008

[4] [4]

Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-lojasiewicz inequality

Attouch, H., Bolte, J., Redont, P., and Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-lojasiewicz inequality. Mathematics of Operations Research, 35 0 (2): 0 438--457, 2010

work page 2010

[5] [5]

Deep layers as stochastic solvers

Bibi, A., Ghanem, B., Koltun, V., and Ranftl, R. Deep layers as stochastic solvers. In International Conference on Learning Representations (ICLR). OpenReview.net, 2019

work page 2019

[6] [6]

Proximal alternating linearized minimization for nonconvex and nonsmooth problems

Bolte, J., Sabach, S., and Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146 0 (1-2): 0 459--494, 2014

work page 2014

[7] [7]

and Absil, P.-a

Boumal, N. and Absil, P.-a. Rtrmc: A riemannian trust-region method for low-rank matrix completion. Advances in Neural Information Processing Systems (NeurIPS), 24, 2011

work page 2011

[8] [8]

Proximal gradient method for nonsmooth optimization over the stiefel manifold

Chen, S., Ma, S., Man-Cho So, A., and Zhang, T. Proximal gradient method for nonsmooth optimization over the stiefel manifold. SIAM Journal on Optimization, 30 0 (1): 0 210--239, 2020

work page 2020

[9] [9]

An augmented lagrangian method for _1 -regularized optimization problems with orthogonality constraints

Chen, W., Ji, H., and You, Y. An augmented lagrangian method for _1 -regularized optimization problems with orthogonality constraints. SIAM Journal on Scientific Computing, 38 0 (4): 0 B570--B592, 2016

work page 2016

[10] [10]

Optimal solutions for sparse principal component analysis

d'Aspremont, A., Bach, F., and El Ghaoui, L. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9 0 (7), 2008

work page 2008

[11] [11]

A., and Smith, S

Edelman, A., Arias, T. A., and Smith, S. T. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20 0 (2): 0 303--353, 1998

work page 1998

[12] [12]

and Bruna, J

Frerix, T. and Bruna, J. Approximating orthogonal matrices with effective givens factorization. In International Conference on Machine Learning (ICML), pp.\ 1993--2001, 2019

work page 1993

[13] [13]

A new first-order algorithmic framework for optimization problems with orthogonality constraints

Gao, B., Liu, X., Chen, X., and Yuan, Y.-x. A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM Journal on Optimization, 28 0 (1): 0 302--332, 2018

work page 2018

[14] [14]

Parallelizable algorithms for optimization problems with orthogonality constraints

Gao, B., Liu, X., and Yuan, Y.-x. Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM Journal on Scientific Computing, 41 0 (3): 0 A1949--A1983, 2019

work page 2019

[15] [15]

Golub, G. H. and Van Loan, C. F. Matrix computations. 2013

work page 2013

[16] [16]

and Yuan, X

He, B. and Yuan, X. On the O (1/n) convergence rate of the douglascrachford alternating direction method. SIAM Journal on Numerical Analysis, 50 0 (2): 0 700--709, 2012

work page 2012

[17] [17]

Adaptive quadratically regularized newton method for riemannian optimization

Hu, J., Milzarek, A., Wen, Z., and Yuan, Y. Adaptive quadratically regularized newton method for riemannian optimization. SIAM Journal on Matrix Analysis and Applications, 39 0 (3): 0 1181--1207, 2018

work page 2018

[18] [18]

Structured quasi-newton methods for optimization with orthogonality constraints

Hu, J., Jiang, B., Lin, L., Wen, Z., and Yuan, Y.-x. Structured quasi-newton methods for optimization with orthogonality constraints. SIAM Journal on Scientific Computing, 41 0 (4): 0 A2239--A2269, 2019

work page 2019

[19] [19]

A brief introduction to manifold optimization

Hu, J., Liu, X., Wen, Z.-W., and Yuan, Y.-X. A brief introduction to manifold optimization. Journal of the Operations Research Society of China, 8 0 (2): 0 199--248, 2020

work page 2020

[20] [20]

J., Collins, M

Hwang, S. J., Collins, M. D., Ravi, S. N., Ithapu, V. K., Adluru, N., Johnson, S. C., and Singh, V. A projection free method for generalized eigenvalue problem with a nonsmooth regularizer. In IEEE International Conference on Computer Vision (ICCV), pp.\ 1841--1849, 2015

work page 2015

[21] [21]

and Dai, Y.-H

Jiang, B. and Dai, Y.-H. A framework of constraint preserving update schemes for optimization on stiefel manifold. Mathematical Programming, 153 0 (2): 0 535--575, 2015

work page 2015

[22] [22]

_p -norm regularization algorithms for optimization over permutation matrices

Jiang, B., Liu, Y.-F., and Wen, Z. _p -norm regularization algorithms for optimization over permutation matrices. SIAM Journal on Optimization, 26 0 (4): 0 2284--2313, 2016

work page 2016

[23] [23]

An exact penalty approach for optimization with nonnegative orthogonality constraints

Jiang, B., Meng, X., Wen, Z., and Chen, X. An exact penalty approach for optimization with nonnegative orthogonality constraints. Mathematical Programming, pp.\ 1--43, 2022

work page 2022

[24] [24]

Generalized power method for sparse principal component analysis

Journ \'e e, M., Nesterov, Y., Richt \'a rik, P., and Sepulchre, R. Generalized power method for sparse principal component analysis. Journal of Machine Learning Research, 11 0 (2), 2010

work page 2010

[25] [25]

and Osher, S

Lai, R. and Osher, S. A splitting method for orthogonality constrained problems. Journal of Scientific Computing, 58 0 (2): 0 431--449, 2014

work page 2014

[26] [26]

Weakly convex optimization over stiefel manifold using riemannian subgradient-type methods

Li, X., Chen, S., Deng, Z., Qu, Q., Zhu, Z., and Man-Cho So, A. Weakly convex optimization over stiefel manifold using riemannian subgradient-type methods. SIAM Journal on Optimization, 31 0 (3): 0 1605--1634, 2021

work page 2021

[27] [27]

Liu, H., Wu, W., and So, A. M.-C. Quadratic optimization with orthogonality constraints: Explicit lojasiewicz exponent and linear convergence of line-search methods. In International Conference on Machine Learning (ICML), pp.\ 1158--1167, 2016

work page 2016

[28] [28]

On the estimation performance and convergence rate of the generalized power method for phase synchronization

Liu, H., Yue, M.-C., and Man-Cho So, A. On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM Journal on Optimization, 27 0 (4): 0 2426--2446, 2017

work page 2017

[29] [29]

A class of smooth exact penalty function methods for optimization problems with orthogonality constraints

Liu, W., Zhang, Y., Yang, H., and Zhang, S. A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optimization, 69 0 (3): 0 399--426, 2020

work page 2020

[30] [30]

On the convergence of the self-consistent field iteration in kohn--sham density functional theory

Liu, X., Wang, X., Wen, Z., and Yuan, Y. On the convergence of the self-consistent field iteration in kohn--sham density functional theory. SIAM Journal on Matrix Analysis and Applications, 35 0 (2): 0 546--558, 2014

work page 2014

[31] [31]

An efficient gauss--newton algorithm for symmetric low-rank product matrix approximations

Liu, X., Wen, Z., and Zhang, Y. An efficient gauss--newton algorithm for symmetric low-rank product matrix approximations. SIAM Journal on Optimization, 25 0 (3): 0 1571--1608, 2015

work page 2015

[32] [32]

and Zhang, Y

Lu, Z. and Zhang, Y. An augmented lagrangian approach for sparse principal component analysis. Mathematical Programming, 135 0 (1-2): 0 149--193, 2012

work page 2012

[33] [33]

Optimization with first-order surrogate functions

Mairal, J. Optimization with first-order surrogate functions. In International Conference on Machine Learning (ICML), volume 28, pp.\ 783--791, 2013

work page 2013

[34] [34]

and Abrol, V

Massart, E. and Abrol, V. Coordinate descent on the orthogonal group for recurrent neural network training. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 36, pp.\ 7744--7751, 2022

work page 2022

[35] [35]

Mordukhovich, B. S. Variational analysis and generalized differentiation i: Basic theory. Berlin Springer, 330, 2006

work page 2006

[36] [36]

Coordinate descent method for k-means

Nie, F., Xue, J., Wu, D., Wang, R., Li, H., and Li, X. Coordinate descent method for k-means. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 0 (5): 0 2371--2385, 2022

work page 2022

[37] [37]

Exact penalty methods for minimizing a smooth function over the nonnegative orthogonal set

Qian, Y., Pan, S., and Xiao, L. Exact penalty methods for minimizing a smooth function over the nonnegative orthogonal set. arXiv, 11 2021

work page 2021

[38] [38]

A unified convergence analysis of block successive minimization methods for nonsmooth optimization

Razaviyayn, M., Hong, M., and Luo, Z. A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization, 23 0 (2): 0 1126--1153, 2013

work page 2013

[39] [39]

Rockafellar, R. T. and Wets., R. J.-B. Variational analysis. Springer Science & Business Media, 317, 2009

work page 2009

[40] [40]

and Chechik, G

Shalit, U. and Chechik, G. Coordinate-descent for learning orthogonal matrices through givens rotations. In International Conference on Machine Learning (ICML), pp.\ 548--556. PMLR, 2014

work page 2014

[41] [41]

and Bischof, C

Sun, X. and Bischof, C. A basis-kernel representation of orthogonal matrices. SIAM Journal on Matrix Analysis and Applications, 16 0 (4): 0 1184--1196, 1995

work page 1995

[42] [42]

and Yun, S

Tseng, P. and Yun, S. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117 0 (1-2): 0 387--423, 2009

work page 2009

[43] [43]

Q., Cho, J., Lei, J., and Rohe, K

Vu, V. Q., Cho, J., Lei, J., and Rohe, K. Fantope projection and selection: A near-optimal convex relaxation of sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 26, 2013

work page 2013

[44] [44]

and Yin, W

Wen, Z. and Yin, W. A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142 0 (1-2): 0 397, 2013

work page 2013

[45] [45]

Trace-penalty minimization for large-scale eigenspace computation

Wen, Z., Yang, C., Liu, X., and Zhang, Y. Trace-penalty minimization for large-scale eigenspace computation. Journal of Scientific Computing, 66 0 (3): 0 1175--1203, 2016

work page 2016

[46] [46]

Quartic equation: https://en.wikipedia.org/wiki/Quartic_equation

WikiContributors. Quartic equation: https://en.wikipedia.org/wiki/Quartic_equation. pp.\ Last edited in March, 2023

work page 2023

[47] [47]

and Yin, W

Xu, Y. and Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6 0 (3): 0 1758--1789, 2013

work page 2013

[48] [48]

Coordinate descent methods for fractional minimization

Yuan, G. Coordinate descent methods for fractional minimization. arXiv, 2022

work page 2022

[49] [49]

Coordinate descent methods for dc minimization: Optimality conditions and global convergence

Yuan, G. Coordinate descent methods for dc minimization: Optimality conditions and global convergence. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023

work page 2023

[50] [50]

A block decomposition algorithm for sparse optimization

Yuan, G., Shen, L., and Zheng, W.-S. A block decomposition algorithm for sparse optimization. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), 2020

work page 2020

[51] [51]

and Shashua, A

Zass, R. and Shashua, A. Nonnegative sparse PCA . Advances in Neural Information Processing Systems (NeurIPS), pp.\ 1561--1568, 2006 a

work page 2006

[52] [52]

and Shashua, A

Zass, R. and Shashua, A. Nonnegative sparse pca. Advances in Neural Information Processing Systems (NeurIPS), 19, 2006 b

work page 2006

[53] [53]

T.-K., Lin, S., and Yao, Y

Zeng, J., Lau, T. T.-K., Lin, S., and Yao, Y. Global convergence of block coordinate descent in deep learning. In International Conference on Machine Learning (ICML), pp.\ 7313--7323. PMLR, 2019

work page 2019

[54] [54]

Complete dictionary learning via l4-norm maximization over the orthogonal group

Zhai, Y., Yang, Z., Liao, Z., Wright, J., and Ma, Y. Complete dictionary learning via l4-norm maximization over the orthogonal group. Journal of Machine Learning Research, 21 0 (165): 0 1--68, 2020

work page 2020

[55] [55]

Primal-dual optimization algorithms over riemannian manifolds: an iteration complexity analysis

Zhang, J., Ma, S., and Zhang, S. Primal-dual optimization algorithms over riemannian manifolds: an iteration complexity analysis. Mathematical Programming, pp.\ 1--46, 2019

work page 2019

[56] [56]

Gradient type optimization methods for electronic structure calculations

Zhang, X., Zhu, J., Wen, Z., and Zhou, A. Gradient type optimization methods for electronic structure calculations. SIAM Journal on Scientific Computing, 36 0 (3): 0 265--289, 2014

work page 2014

[57] [57]

A riemannian newton algorithm for nonlinear eigenvalue problems

Zhao, Z., Bai, Z.-J., and Jin, X.-Q. A riemannian newton algorithm for nonlinear eigenvalue problems. SIAM Journal on Matrix Analysis and Applications, 36 0 (2): 0 752--774, 2015

work page 2015

[58] [58]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page