Optimization over the intersection of manifolds

Bin Gao; Yan Yang; Ya-Xiang Yuan

arxiv: 2605.22736 · v1 · pith:K52NNVZNnew · submitted 2026-05-21 · 🧮 math.OC · cs.LG· cs.NA· math.DG· math.NA

Optimization over the intersection of manifolds

Yan Yang , Bin Gao , Ya-xiang Yuan This is my paper

Pith reviewed 2026-05-22 04:00 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAmath.DGmath.NA

keywords manifold optimizationintersection of manifoldsintrinsic transversalityclean intersectionretractionconvergence ratefirst-order stationaritysparse optimization

0 comments

The pith

The paper proves that clean intersection and intrinsic transversality are equivalent for two manifolds, enabling a tractable tangent space projection and a convergent geometric optimization method.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Optimization over the intersection of two manifolds is difficult because the geometry of the feasible set is coupled. The authors show that two standard regularity conditions on the pair of manifolds are in fact identical. This equivalence makes the orthogonal projection onto the tangent space of the intersection computable in a direct way. They then design an algorithm that keeps every iterate on one manifold, using one direction to approach the second manifold and a second orthogonal direction to reduce the objective function. Under the regularity condition the method supplies explicit rates for both feasibility and optimality measures while guaranteeing that limit points are first-order stationary.

Core claim

We prove that the regularities clean intersection and intrinsic transversality are equivalent, which yields a tractable projection onto the tangent space of the intersection. Therefore we propose a geometric method that employs a retraction on only one manifold and updates the iterate along two orthogonal directions. The iterates stay on one manifold, and the two directions are responsible for asymptotically approaching the other manifold and decreasing the objective function, respectively. Under intrinsic transversality we derive the convergence rate for both the feasibility and optimality measures, and show that every accumulation point is first-order stationary.

What carries the argument

The equivalence of clean intersection and intrinsic transversality, which supplies a tractable projection onto the tangent space of the intersection and supports the two orthogonal update directions.

If this is right

The projection onto the tangent space of the intersection becomes explicitly computable.
The algorithm produces iterates whose distance to the second manifold and whose objective decrease both admit explicit convergence rates.
Every limit point of the generated sequence satisfies the first-order stationarity condition on the intersection.
The same geometric splitting works on standard sparse and low-rank problems that can be cast as manifold intersection tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same equivalence may let analysts replace ad-hoc penalty or projection steps with the two-direction update in other manifold-constrained problems.
The orthogonal-direction splitting suggests a natural way to incorporate second-order information once a retraction and vector transport are chosen.
Success on hyperbolic embedding approximation indicates the method can be tested directly on graph-layout tasks without additional manifold-specific code.

Load-bearing premise

The manifolds satisfy the intrinsic transversality condition, equivalently that their intersection is clean.

What would settle it

A concrete pair of manifolds whose intersection is clean yet the tangent-space projection onto the intersection fails to be well-defined, or a run of the algorithm in which the feasibility measure does not converge at the stated rate.

read the original abstract

Optimization over the intersection of two manifolds arises in a broad range of applications, but is hindered by the coupled geometry of the feasible region. In this paper, we prove that the regularities -- clean intersection and intrinsic transversality -- are equivalent, which yields a tractable projection onto the tangent space of the intersection. Therefore, we propose a geometric method that employs a retraction on only one manifold and updates the iterate along two orthogonal directions. Specifically, the iterates stay on one manifold, and the two directions are responsible for asymptotically approaching the other manifold and decreasing the objective function, respectively. Under intrinsic transversality, we derive the convergence rate for both the feasibility and optimality measures, and show that every accumulation point is first-order stationary. Numerical experiments on problems stemming from sparse and low-rank optimization, including fitting spherical data, approximating hyperbolic embeddings on real data, and computing compressed modes, demonstrate the effectiveness of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves clean intersection equals intrinsic transversality to get a tangent projection, then builds a retraction-on-one-manifold method with orthogonal feasibility and descent directions.

read the letter

The main point is that this paper proves clean intersection and intrinsic transversality are equivalent for two manifolds. That equivalence gives a tractable projection onto the tangent space of the intersection, which they use to design an algorithm that retracts only on the first manifold while updating along two orthogonal directions—one for approaching the second manifold and one for objective decrease. Under the transversality condition they obtain rates on both feasibility and optimality measures and show accumulation points are first-order stationary. This splitting approach for the intersection setting looks new compared with standard single-manifold retraction methods. The work connects directly to sparse and low-rank problems and includes experiments on spherical data fitting, hyperbolic embedding approximation on real data, and compressed modes, which gives it concrete grounding. A soft spot is that tractability of the projection and the orthogonal directions depends on whether the equivalence supplies an explicit, low-cost formula rather than just existence. If computing the directions requires an extra projection step whose cost is not bounded, the per-iteration work could be higher than the abstract suggests. The assumption of intrinsic transversality is also load-bearing for the rates, so readers will want to see how often it holds in the target applications. This is aimed at researchers already using manifold optimization for low-rank or embedding tasks who need to handle intersection constraints. A reader familiar with retraction techniques will recognize the extension without much trouble. The geometric idea is coherent and the applications are relevant, so the paper deserves a serious referee even if the proofs and implementation details need close checking. I would send it out for review.

Referee Report

2 major / 2 minor

Summary. The manuscript proves that clean intersection and intrinsic transversality are equivalent for two manifolds, which allows a tractable projection onto the tangent space of the intersection. Based on this, it develops a geometric optimization algorithm that uses a retraction on only one manifold and performs updates in two orthogonal directions: one for feasibility with respect to the second manifold and one for objective decrease. Convergence rates for feasibility and optimality measures are derived under the intrinsic transversality condition, every accumulation point is shown to be first-order stationary, and numerical experiments on sparse/low-rank optimization problems, spherical data fitting, hyperbolic embeddings, and compressed modes are presented to demonstrate effectiveness.

Significance. If the equivalence and convergence results hold, the work advances manifold-constrained optimization by reducing the need for retractions on both manifolds and providing explicit rates under a verifiable regularity condition. The equivalence theorem and the orthogonal-direction splitting are notable strengths, as is the application to concrete problems in sparse and low-rank recovery. The numerical validation on real data adds practical value.

major comments (2)

[§3] §3 (Algorithm 1 and surrounding text): The claimed tractability of the projection onto T(M1 ∩ M2) is central to the orthogonal splitting of the update directions, yet the manuscript supplies only an existence argument from the equivalence rather than an explicit, closed-form or efficiently computable expression (e.g., via normal-space decomposition). Without this, the per-iteration cost and the orthogonality guarantee cannot be verified from the given description.
[Theorem 4.3] Theorem 4.3 (accumulation-point stationarity): The proof invokes intrinsic transversality to ensure the limit point satisfies the first-order condition on the intersection, but the argument does not address the case where the transversality constant degenerates along the sequence; a quantitative bound on the constant would be needed to make the stationarity claim robust.

minor comments (2)

[Abstract] Abstract: The phrase 'clean intersection' appears without a one-sentence reminder of its definition or a pointer to the section where it is introduced.
[Numerical experiments] Numerical section: Tables reporting iteration counts and final objective values would benefit from reporting the number of random initializations and standard deviations to allow assessment of variability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive comments. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (Algorithm 1 and surrounding text): The claimed tractability of the projection onto T(M1 ∩ M2) is central to the orthogonal splitting of the update directions, yet the manuscript supplies only an existence argument from the equivalence rather than an explicit, closed-form or efficiently computable expression (e.g., via normal-space decomposition). Without this, the per-iteration cost and the orthogonality guarantee cannot be verified from the given description.

Authors: We agree that greater explicitness would improve verifiability. The equivalence of clean intersection and intrinsic transversality established in Theorem 3.1 directly yields an explicit formula: the tangent space T(M1 ∩ M2) is the orthogonal complement of N_{M1} + N_{M2}. Consequently, the orthogonal projection onto this tangent space is obtained by solving a small linear least-squares problem whose matrix is assembled from orthonormal bases of the two normal spaces (whose dimensions equal the respective codimensions). When these codimensions are modest—as is the case in all numerical examples in the paper—the solve is inexpensive and can be performed with standard dense linear-algebra routines. We will add a dedicated paragraph and a short algorithmic box in §3 that spells out this construction, states its O(d^3) cost for normal-space dimension d, and verifies that the two update directions remain orthogonal by construction. revision: yes
Referee: [Theorem 4.3] Theorem 4.3 (accumulation-point stationarity): The proof invokes intrinsic transversality to ensure the limit point satisfies the first-order condition on the intersection, but the argument does not address the case where the transversality constant degenerates along the sequence; a quantitative bound on the constant would be needed to make the stationarity claim robust.

Authors: We thank the referee for highlighting this subtlety. Intrinsic transversality is a pointwise condition: at any point x in M1 ∩ M2 there exists δ(x) > 0 such that the angle between the normal spaces satisfies the stated lower bound. In the proof we pass to the limit and invoke the condition at the accumulation point x*, where δ(x*) > 0 by hypothesis. Nevertheless, to preclude any possibility that δ(x_k) → 0 along the sequence, we will strengthen the statement of Theorem 4.3 by adding the standing assumption that the transversality constant is uniformly bounded away from zero in a neighborhood of x*. This is a standard, mild regularity hypothesis in the manifold-optimization literature and will be accompanied by a brief remark explaining why it is satisfied in the concrete applications considered. The proof will be updated to cite the uniform bound explicitly when controlling the limiting first-order condition. revision: yes

Circularity Check

0 steps flagged

No circularity: equivalence proof and algorithm derivation are self-contained

full rationale

The paper establishes the equivalence of clean intersection and intrinsic transversality via a mathematical proof, which directly supports the existence of a tractable tangent-space projection for the intersection. This equivalence is then used to construct the update directions and derive convergence rates under the stated assumption. No quoted step reduces a claimed prediction or result to a fitted parameter, self-definition, or unverified self-citation chain; the derivation relies on standard manifold geometry and the internal proof rather than renaming known results or smuggling ansatzes. The central claims remain independent of the algorithm's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions from Riemannian geometry and manifold optimization without introducing new free parameters or invented entities.

axioms (1)

domain assumption The two manifolds are smooth Riemannian manifolds whose intersection satisfies the regularity conditions of clean intersection or intrinsic transversality.
This is the key regularity invoked to define the tangent space projection and derive convergence.

pith-pipeline@v0.9.0 · 5690 in / 1275 out tokens · 59157 ms · 2026-05-22T04:00:30.033375+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Ablin and G

[1]P. Ablin and G. Peyré,Fast and accurate optimization on the orthogonal manifold without retraction, in International Conference on Artificial Intelligence and Statistics, PMLR, 2022, pp. 5636–5657. [2]P. Ablin, S. V ary, B. Gao, and P.-A. Absil,Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality ...

work page 2022
[2]

Andersson and M

[4]F. Andersson and M. Carlsson,Alternating projections on nontangential manifolds, Con- structive Approximation, 38 (2013), pp. 489–525. [5]R. Andreani, E. G. Birgin, J. M. Martínez, and M. L. Schuverdt,On augmented Lagrangian methods with general lower-level constraints, SIAM Journal on Optimization, 18 (2008), pp. 1286–1309. [6]R. Andreani, K. R. Couto...

work page 2013
[3]

Boumal, P.-A

[13]N. Boumal, P.-A. Absil, and C. Cartis,Global rates of convergence for nonconvex opti- mization on manifolds, IMA Journal of Numerical Analysis, 39 (2019), pp. 1–33. [14]N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre,Manopt, a Matlab toolbox for op- timization on manifolds, The Journal of Machine Learning Research, 15 (2014), pp. 1455–

work page 2019
[4]

Retractions by Alternating Projections

[15]S. Budzinskiy,Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors, Numerische Mathematik, 157 (2025), pp. 1491–1535. [16]T. P. Cason, P.-A. Absil, and P. V an Dooren,Iterative methods for low rank approx- imation of graph similarity matrices, Linear Algebra and its Applications, 438 (2013), pp. 1863–188...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

[19]S. Chen, S. Ma, A. Man-Cho So, and T. Zhang,Proximal gradient method for nonsmooth optimization over the Stiefel manifold, SIAM Journal on Optimization, 30 (2020), pp. 210–

work page 2020
[6]

[20]M. Chu, N. Del Buono, L. Lopez, and T. Politi,On the low-rank approximation of data on the unit sphere, SIAM Journal on Matrix Analysis and Applications, 27 (2005), pp. 46–60. [21]D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis,Transversality and alternating projections for nonconvex sets, Foundations of Computational Mathematics, 15 (2015), pp. 1637–16...

work page arXiv 2005
[7]

[28]A. D. Ioffe,Transversality in variational analysis, Journal of Optimization Theory and Ap- plications, 174 (2017), pp. 343–366. [29]P. Ja w anpuria, M. Meghw anshi, and B. Mishra,Low-rank approximations of hyperbolic embeddings, in 2019 IEEE 58th conference on decision and control, IEEE, 2019, pp. 7159–

work page 2017
[8]

[30]X. Jia, C. Kanzow, P. Mehlitz, and G. W achsmuth,An augmented Lagrangian method for optimization problems with structured geometric constraints, Mathematical Program- ming, 199 (2023), pp. 1365–1415. [31]Z. Lai and A. Yoshise,Riemannian interior point methods for constrained optimization on manifolds, Journal of Optimization Theory and Applications, 2...

work page 2023
[9]

Levin, J

[33]E. Levin, J. Kileel, and N. Boumal,The effect of smooth parametrizations on nonconvex optimization landscapes, Mathematical Programming, 209 (2025), pp. 63–111. [34]A. S. Lewis, D. R. Luke, and J. Malick,Local linear convergence for alternating and averaged nonconvex projections, Foundations of Computational Mathematics, 9 (2009), pp. 485–513. [35]A. ...

work page 2025
[10]

[37]X. Li, N. Xiu, and S. Zhou,Matrix optimization over low-rank spectral sets: stationary points and local and global minimizers, Journal of Optimization Theory and Applications, 184 (2020), pp. 895–930. [38]C. Liu and N. Boumal,Simple algorithms for optimization on Riemannian manifolds with constraints, Applied Mathematics & Optimization, 82 (2020), pp....

work page 2020
[11]

[40]G. A. Miller,WordNet: a lexical database for English, Communications of the ACM, 38 (1995), pp. 39–41. [41]V. Neumann,Functional operators, The Geometry of Orthogonal Spaces, (1950). [42]M. Nickel and D. Kiela,Poincaré embeddings for learning hierarchical representations, Advances in neural information processing systems, 30 (2017). [43]J. Nocedal and...

work page 1995
[12]

Noll and A

[45]D. Noll and A. Rondepierre,On local convergence of the method of alternating projections, Foundations of Computational Mathematics, 16 (2016), pp. 425–455. [46]M. Obara, T. Okuno, and A. Takeda,Sequential quadratic optimization for nonlinear op- timization problems on Riemannian manifolds, SIAM Journal on Optimization, 32 (2022), pp. 822–853. [47]G. O...

work page 2016
[13]

Ozolin , š, R

[48]V. Ozolin , š, R. Lai, R. Caflisch, and S. Osher,Compressed modes for variational prob- lems in mathematics and physics, Proceedings of the National Academy of Sciences, 110 (2013), pp. 18368–18373. [49]R. Peng, C. Zhu, B. Gao, X. W ang, and Y.-x. Yuan,Normalized tensor train decompo- sition, arXiv preprint arXiv:2511.04369, (2025). [50]R. T. Rockafel...

work page arXiv 2013
[14]

26Y. YANG, B. GAO, AND Y.-X. YUAN [52]J. B. Rosen,The gradient projection method for nonlinear programming. Part I. Linear con- straints, Journal of the Society for Industrial and Applied Mathematics, 8 (1960), pp. 181–

work page 1960
[15]

[53]J. B. Rosen,The gradient projection method for nonlinear programming. Part II. Nonlin- ear constraints, Journal of the Society for Industrial and Applied Mathematics, 9 (1961), pp. 514–532. [54]S. Schechtman, D. Tiapkin, M. Muehlebach, and E. Moulines,Orthogonal directions constrained gradient method: from non-linear equality constraints to Stiefel ma...

work page 1961
[16]

V andereycken,Low-rank matrix completion by Riemannian optimization, SIAM Journal on Optimization, 23 (2013), pp

[58]B. V andereycken,Low-rank matrix completion by Riemannian optimization, SIAM Journal on Optimization, 23 (2013), pp. 1214–1236. [59]S. V ary, P. Ablin, B. Gao, and P.-A. Absil,Optimization without retraction on the random generalized Stiefel manifold, in Proceedings of the 41st International Conference on Machine Learning, vol. 235, PMLR, 2024, pp. 49...

work page arXiv 2013

[1] [1]

Ablin and G

[1]P. Ablin and G. Peyré,Fast and accurate optimization on the orthogonal manifold without retraction, in International Conference on Artificial Intelligence and Statistics, PMLR, 2022, pp. 5636–5657. [2]P. Ablin, S. V ary, B. Gao, and P.-A. Absil,Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality ...

work page 2022

[2] [2]

Andersson and M

[4]F. Andersson and M. Carlsson,Alternating projections on nontangential manifolds, Con- structive Approximation, 38 (2013), pp. 489–525. [5]R. Andreani, E. G. Birgin, J. M. Martínez, and M. L. Schuverdt,On augmented Lagrangian methods with general lower-level constraints, SIAM Journal on Optimization, 18 (2008), pp. 1286–1309. [6]R. Andreani, K. R. Couto...

work page 2013

[3] [3]

Boumal, P.-A

[13]N. Boumal, P.-A. Absil, and C. Cartis,Global rates of convergence for nonconvex opti- mization on manifolds, IMA Journal of Numerical Analysis, 39 (2019), pp. 1–33. [14]N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre,Manopt, a Matlab toolbox for op- timization on manifolds, The Journal of Machine Learning Research, 15 (2014), pp. 1455–

work page 2019

[4] [4]

Retractions by Alternating Projections

[15]S. Budzinskiy,Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors, Numerische Mathematik, 157 (2025), pp. 1491–1535. [16]T. P. Cason, P.-A. Absil, and P. V an Dooren,Iterative methods for low rank approx- imation of graph similarity matrices, Linear Algebra and its Applications, 438 (2013), pp. 1863–188...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

[19]S. Chen, S. Ma, A. Man-Cho So, and T. Zhang,Proximal gradient method for nonsmooth optimization over the Stiefel manifold, SIAM Journal on Optimization, 30 (2020), pp. 210–

work page 2020

[6] [6]

[20]M. Chu, N. Del Buono, L. Lopez, and T. Politi,On the low-rank approximation of data on the unit sphere, SIAM Journal on Matrix Analysis and Applications, 27 (2005), pp. 46–60. [21]D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis,Transversality and alternating projections for nonconvex sets, Foundations of Computational Mathematics, 15 (2015), pp. 1637–16...

work page arXiv 2005

[7] [7]

[28]A. D. Ioffe,Transversality in variational analysis, Journal of Optimization Theory and Ap- plications, 174 (2017), pp. 343–366. [29]P. Ja w anpuria, M. Meghw anshi, and B. Mishra,Low-rank approximations of hyperbolic embeddings, in 2019 IEEE 58th conference on decision and control, IEEE, 2019, pp. 7159–

work page 2017

[8] [8]

[30]X. Jia, C. Kanzow, P. Mehlitz, and G. W achsmuth,An augmented Lagrangian method for optimization problems with structured geometric constraints, Mathematical Program- ming, 199 (2023), pp. 1365–1415. [31]Z. Lai and A. Yoshise,Riemannian interior point methods for constrained optimization on manifolds, Journal of Optimization Theory and Applications, 2...

work page 2023

[9] [9]

Levin, J

[33]E. Levin, J. Kileel, and N. Boumal,The effect of smooth parametrizations on nonconvex optimization landscapes, Mathematical Programming, 209 (2025), pp. 63–111. [34]A. S. Lewis, D. R. Luke, and J. Malick,Local linear convergence for alternating and averaged nonconvex projections, Foundations of Computational Mathematics, 9 (2009), pp. 485–513. [35]A. ...

work page 2025

[10] [10]

[37]X. Li, N. Xiu, and S. Zhou,Matrix optimization over low-rank spectral sets: stationary points and local and global minimizers, Journal of Optimization Theory and Applications, 184 (2020), pp. 895–930. [38]C. Liu and N. Boumal,Simple algorithms for optimization on Riemannian manifolds with constraints, Applied Mathematics & Optimization, 82 (2020), pp....

work page 2020

[11] [11]

[40]G. A. Miller,WordNet: a lexical database for English, Communications of the ACM, 38 (1995), pp. 39–41. [41]V. Neumann,Functional operators, The Geometry of Orthogonal Spaces, (1950). [42]M. Nickel and D. Kiela,Poincaré embeddings for learning hierarchical representations, Advances in neural information processing systems, 30 (2017). [43]J. Nocedal and...

work page 1995

[12] [12]

Noll and A

[45]D. Noll and A. Rondepierre,On local convergence of the method of alternating projections, Foundations of Computational Mathematics, 16 (2016), pp. 425–455. [46]M. Obara, T. Okuno, and A. Takeda,Sequential quadratic optimization for nonlinear op- timization problems on Riemannian manifolds, SIAM Journal on Optimization, 32 (2022), pp. 822–853. [47]G. O...

work page 2016

[13] [13]

Ozolin , š, R

[48]V. Ozolin , š, R. Lai, R. Caflisch, and S. Osher,Compressed modes for variational prob- lems in mathematics and physics, Proceedings of the National Academy of Sciences, 110 (2013), pp. 18368–18373. [49]R. Peng, C. Zhu, B. Gao, X. W ang, and Y.-x. Yuan,Normalized tensor train decompo- sition, arXiv preprint arXiv:2511.04369, (2025). [50]R. T. Rockafel...

work page arXiv 2013

[14] [14]

26Y. YANG, B. GAO, AND Y.-X. YUAN [52]J. B. Rosen,The gradient projection method for nonlinear programming. Part I. Linear con- straints, Journal of the Society for Industrial and Applied Mathematics, 8 (1960), pp. 181–

work page 1960

[15] [15]

[53]J. B. Rosen,The gradient projection method for nonlinear programming. Part II. Nonlin- ear constraints, Journal of the Society for Industrial and Applied Mathematics, 9 (1961), pp. 514–532. [54]S. Schechtman, D. Tiapkin, M. Muehlebach, and E. Moulines,Orthogonal directions constrained gradient method: from non-linear equality constraints to Stiefel ma...

work page 1961

[16] [16]

V andereycken,Low-rank matrix completion by Riemannian optimization, SIAM Journal on Optimization, 23 (2013), pp

[58]B. V andereycken,Low-rank matrix completion by Riemannian optimization, SIAM Journal on Optimization, 23 (2013), pp. 1214–1236. [59]S. V ary, P. Ablin, B. Gao, and P.-A. Absil,Optimization without retraction on the random generalized Stiefel manifold, in Proceedings of the 41st International Conference on Machine Learning, vol. 235, PMLR, 2024, pp. 49...

work page arXiv 2013