pith. machine review for the scientific record. sign in

arxiv: 2605.05446 · v1 · submitted 2026-05-06 · 📊 stat.ML · cs.IT· cs.LG· math.IT· math.OC

Recognition: unknown

Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation

Authors on Pith no claims yet

Pith reviewed 2026-05-08 15:46 UTC · model grok-4.3

classification 📊 stat.ML cs.ITcs.LGmath.ITmath.OC
keywords low-rank matrix estimationnonconvex optimizationbenign regularizerlocal strong convexitydisguised convexitytheoretical frameworkmachine learning
0
0 comments X

The pith

Nonconvex low-rank matrix estimation procedures are equivalent to locally strongly convex problems via a benign regularizer that leaves the algorithm unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a general framework to explain the success of nonconvex methods for low-rank matrix estimation without relying on extra regularization that is often unnecessary in practice. It introduces a benign regularizer that reformulates the nonconvex update rule into an equivalent locally strongly convex problem while keeping the original algorithm identical. This reveals a disguised convexity that allows standard convex analysis tools to provide guarantees. A sympathetic reader cares because the approach avoids problem-specific arguments and applies to a broad class of estimation tasks common in machine learning. If correct, it unifies theoretical understanding across different nonconvex procedures.

Core claim

The central claim is that for a broad class of low-rank matrix estimation problems, there exists a benign regularizer which does not alter the original nonconvex update rule but yields an equivalent formulation that is locally strongly convex. This perspective uncovers a disguised convexity inherent in the nonconvex procedure and provides a new route to theoretical guarantees for nonconvex low-rank matrix estimation.

What carries the argument

The benign regularizer, a device that reformulates the nonconvex algorithm to reveal local strong convexity without changing the practical update steps.

If this is right

  • Theoretical guarantees for nonconvex methods follow from analyzing the equivalent locally strongly convex problem.
  • The framework applies across many different low-rank estimation models instead of requiring separate proofs for each.
  • Practical implementations of the algorithms remain exactly the same while gaining access to convex analysis techniques.
  • Additional regularization is often unnecessary because the disguised convexity already controls the behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same benign-regularizer idea could be tested on other nonconvex problems outside low-rank matrices, such as certain tensor or neural network training tasks.
  • Algorithm designers might deliberately search for benign regularizers when creating new nonconvex procedures to make analysis easier.
  • If the construction works for matrix completion and phase retrieval, it suggests the disguised convexity is common in high-dimensional estimation.
  • Empirical checks could involve verifying that the regularizer leaves iterates unchanged on standard benchmark datasets.

Load-bearing premise

That a benign regularizer exists and can be constructed for a broad class of low-rank matrix estimation problems such that the equivalence to local strong convexity holds while leaving the practical algorithm unchanged.

What would settle it

A concrete low-rank matrix estimation problem for which no benign regularizer can be constructed that preserves the exact original update rule while producing an equivalent locally strongly convex formulation.

read the original abstract

Nonconvex methods have emerged as a dominant approach for low-rank matrix estimation, a problem that arises widely in machine learning and AI for learning and representing high-dimensional data. Existing analyses for these methods often require additional regularization to mitigate nonconvexity, even though such regularization is often unnecessary in practice. Moreover, most analyses rely on problem-specific arguments that are difficult to generalize to more complex settings. In this paper, we develop a theoretical framework for studying nonconvex procedures across a broad class of low-rank matrix estimation problems. Rather than focusing on a specific model, we reveal a fundamental mechanism that explains why nonconvex procedures can behave well in low-rank estimation. Our key device is a {\it benign regularizer} that does not alter the original update rule, but yields an equivalent locally strongly convex formulation of the algorithm. This perspective uncovers a disguised convexity inherent in the nonconvex procedure and provides a new route to theoretical guarantees for nonconvex low-rank matrix estimation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript develops a theoretical framework for nonconvex low-rank matrix estimation problems such as matrix sensing, completion, and robust variants. It introduces a benign regularizer that leaves the original update rule unchanged while producing an equivalent locally strongly convex formulation of the algorithm. The equivalence is established by verifying that the regularizer gradient vanishes identically along the algorithm trajectory, with its Hessian supplying the missing strong-convexity term. This perspective reveals an inherent disguised convexity and supplies a unified route to theoretical guarantees without problem-specific arguments.

Significance. If the central construction holds, the result is significant: it supplies an explicit general construction that applies across the stated class of low-rank estimation problems without requiring knowledge of the unknown solution or altering the iterates. The paper thereby offers a reproducible, non-problem-specific mechanism that explains the practical success of nonconvex procedures and opens a new avenue for deriving guarantees in more complex settings.

minor comments (2)
  1. [Section 3] In the statement of the main equivalence result (around the construction of the benign regularizer), explicitly note the precise trajectory set on which the gradient vanishes to make the local strong-convexity claim fully unambiguous.
  2. [Section 5] The discussion of how the framework extends to robust variants would benefit from a short remark on whether the same benign regularizer construction applies verbatim or requires a minor modification to the Hessian term.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our manuscript. The recommendation for minor revision is noted, and we will address any editorial or minor polishing points in the revised version. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper constructs an explicit benign regularizer whose gradient vanishes on the algorithm trajectory while its Hessian supplies the local strong-convexity term. This equivalence is verified directly from the update rule and problem geometry for matrix sensing, completion, and robust variants, without fitting parameters to data, invoking self-citations for uniqueness, or redefining the target result in terms of itself. The central claim therefore rests on an independent algebraic verification rather than any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a benign regularizer for the broad class of problems; this device is introduced by the paper itself.

axioms (1)
  • domain assumption Standard domain assumptions on low-rank matrix estimation problems (e.g., restricted isometry or incoherence conditions) that allow local strong convexity to be established.
    Invoked implicitly to ensure the benign regularizer produces the desired equivalence.
invented entities (1)
  • Benign regularizer no independent evidence
    purpose: To reformulate the nonconvex procedure as locally strongly convex without changing the update rule.
    Newly postulated device whose construction and properties are the core contribution.

pith-pipeline@v0.9.0 · 5474 in / 1234 out tokens · 35976 ms · 2026-05-08T15:46:44.203398+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems, 32, 2019

    Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems, 32, 2019

  2. [2]

    Sharp nonasymptotic bounds on the norm of random matrices with independent entries.Annals of Probability, 44(4):2479–2506, 2016

    Afonso S Bandeira and Ramon Van Handel. Sharp nonasymptotic bounds on the norm of random matrices with independent entries.Annals of Probability, 44(4):2479–2506, 2016

  3. [3]

    Global optimality of local search for low rank matrix recovery.Advances in Neural Information Processing Systems, 29, 2016

    Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro. Global optimality of local search for low rank matrix recovery.Advances in Neural Information Processing Systems, 29, 2016

  4. [4]

    Monteiro

    Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Mathematical programming, 95(2):329–357, 2003

  5. [5]

    A max-norm constrained minimization approach to 1-bit matrix completion.Journal of Machine Learning Research, 14(114):3619–3647, 2013

    Tony Cai and Wen-Xin Zhou. A max-norm constrained minimization approach to 1-bit matrix completion.Journal of Machine Learning Research, 14(114):3619–3647, 2013

  6. [6]

    Exact matrix completion via convex optimization

    Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111–119, 2012

  7. [7]

    Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

    Emmanuel J Candes and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

  8. [8]

    The power of convex relaxation: Near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):2053–2080, 2010

    Emmanuel J Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):2053–2080, 2010

  9. [9]

    Robust principal component analysis?Journal of the ACM (JACM), 58(3):1–37, 2011

    Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis?Journal of the ACM (JACM), 58(3):1–37, 2011

  10. [10]

    Phase retrieval via Wirtinger flow: Theory and algorithms.IEEE Transactions on Information Theory, 61(4):1985–2007, 2015

    Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms.IEEE Transactions on Information Theory, 61(4):1985–2007, 2015

  11. [11]

    Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177–214, 2015

    Sourav Chatterjee. Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177–214, 2015

  12. [12]

    Nonconvex rectangular matrix completion via gradient descent withoutℓ2,∞ regularization.IEEE Transactions on Information Theory, 66(9):5806–5841, 2020

    Ji Chen, Dekai Liu, and Xiaodong Li. Nonconvex rectangular matrix completion via gradient descent withoutℓ2,∞ regularization.IEEE Transactions on Information Theory, 66(9):5806–5841, 2020. 12

  13. [13]

    Incoherence-optimal matrix completion.IEEE Transactions on Information Theory, 61(5):2909–2923, 2015

    Yudong Chen. Incoherence-optimal matrix completion.IEEE Transactions on Information Theory, 61(5):2909–2923, 2015

  14. [14]

    Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees.arXiv preprint arXiv:1509.03025, 2015

    Yudong Chen and Martin J Wainwright. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees.arXiv preprint arXiv:1509.03025, 2015

  15. [15]

    Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization.SIAM Journal on Optimization, 30(4):3098–3121, 2020

    Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, and Yuling Yan. Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization.SIAM Journal on Optimization, 30(4):3098–3121, 2020

  16. [16]

    Nearly optimal robust matrix completion

    Yeshwanth Cherapanamjeri, Kartik Gupta, and Prateek Jain. Nearly optimal robust matrix completion. InInternational Conference on Machine Learning, pages 797–805. PMLR, 2017

  17. [17]

    Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019

    Yuejie Chi, Yue M Lu, and Yuxin Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019

  18. [18]

    Identifiability and inference for generalized latent factor models

    Chengyu Cui and Gongjun Xu. Identifiability and inference for generalized latent factor models. arXiv preprint arXiv:2508.05866, 2025

  19. [19]

    Beyond vintage rotation: Bias-free sparse representation learning with oracle inference.arXiv preprint arXiv:2602.22590, 2026

    Chengyu Cui, Yunxiao Chen, Jing Ouyang, and Gongjun Xu. Beyond vintage rotation: Bias-free sparse representation learning with oracle inference.arXiv preprint arXiv:2602.22590, 2026

  20. [20]

    An overview of low-rank matrix recovery from incomplete observations.IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016

    Mark A Davenport and Justin Romberg. An overview of low-rank matrix recovery from incomplete observations.IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016

  21. [21]

    Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters

    Mark A. Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters. 1-bit matrix completion.Information and Inference: A Journal of the IMA, 3(3):189–223, 2014

  22. [22]

    Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced.Advances in Neural Information Processing Systems, 31, 2018

    Simon S Du, Wei Hu, and Jason D Lee. Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced.Advances in Neural Information Processing Systems, 31, 2018

  23. [23]

    Matrix completion has no spurious local minimum

    Rong Ge, Jason D Lee, and Tengyu Ma. Matrix completion has no spurious local minimum. Advances in Neural Information Processing Systems, 29, 2016

  24. [24]

    No spurious local minima in nonconvex low rank problems: A unified geometric analysis

    Rong Ge, Chi Jin, and Yi Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. InInternational Conference on Machine Learning, pages 1233–1242. PMLR, 2017

  25. [25]

    LoRA: Low-rank adaptation of large language models.International Conference on Learning Representations, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. LoRA: Low-rank adaptation of large language models.International Conference on Learning Representations, 1(2):3, 2022

  26. [26]

    Guaranteed rank minimization via singular value projection.Advances in Neural Information Processing Systems, 23, 2010

    Prateek Jain, Raghu Meka, and Inderjit Dhillon. Guaranteed rank minimization via singular value projection.Advances in Neural Information Processing Systems, 23, 2010

  27. [27]

    Matrix completion from a few entries.IEEE Transactions on Information Theory, 56(6):2980–2998, 2010

    Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries.IEEE Transactions on Information Theory, 56(6):2980–2998, 2010

  28. [28]

    Springer Science & Business Media, 2012

    Serge Lang.Real and functional analysis, volume 142. Springer Science & Business Media, 2012

  29. [29]

    Consistency of spectral clustering in stochastic block models

    Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015. 13

  30. [30]

    and ZHU, J

    Jinming Li, Shihao Wu, Chengyu Cui, Gongjun Xu, and Ji Zhu. Statistical inference on latent space models for network data.arXiv preprint arXiv:2312.06605v3, 2025

  31. [31]

    The global geometry of centralized and distributed low-rank matrix recovery without regularization.IEEE Signal Processing Letters, 27:1400–1404, 2020

    Shuang Li, Qiuwei Li, Zhihui Zhu, Gongguo Tang, and Michael B Wakin. The global geometry of centralized and distributed low-rank matrix recovery without regularization.IEEE Signal Processing Letters, 27:1400–1404, 2020

  32. [32]

    Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent.Information and Inference: A Journal of the IMA, 9(2):289–325, 2020

    Yuanxin Li, Yuejie Chi, Huishuai Zhang, and Yingbin Liang. Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent.Information and Inference: A Journal of the IMA, 9(2):289–325, 2020

  33. [33]

    Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion

    Cong Ma, Kaizheng Wang, Yuejie Chi, and Yuxin Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion. InInternational Conference on Machine Learning, pages 3345–3354. PMLR, 2018

  34. [34]

    Beyond Procrustes: Balancing-free gradient descent for asymmetric low-rank matrix sensing.IEEE Transactions on Signal Processing, 69:867–877, 2021

    Cong Ma, Yuanxin Li, and Yuejie Chi. Beyond Procrustes: Balancing-free gradient descent for asymmetric low-rank matrix sensing.IEEE Transactions on Signal Processing, 69:867–877, 2021

  35. [35]

    Universal latent space model fitting for large networks with edge covariates.Journal of Machine Learning Research, 21(4):1–67, 2020

    Zhuang Ma, Zongming Ma, and Hongsong Yuan. Universal latent space model fitting for large networks with edge covariates.Journal of Machine Learning Research, 21(4):1–67, 2020

  36. [36]

    Perturbation bounds for the polar decomposition.SIAM Journal on Matrix Analysis and Applications, 14(2):588–597, 1993

    Roy Mathias. Perturbation bounds for the polar decomposition.SIAM Journal on Matrix Analysis and Applications, 14(2):588–597, 1993

  37. [37]

    Phase retrieval using alternating minimization.Advances in Neural Information Processing Systems, 26, 2013

    Praneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization.Advances in Neural Information Processing Systems, 26, 2013

  38. [38]

    Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably.SIAM Journal on Imaging Sciences, 11(4):2165–2204, 2018

    Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi. Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably.SIAM Journal on Imaging Sciences, 11(4):2165–2204, 2018

  39. [39]

    Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM Review, 52(3):471–501, 2010

    Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM Review, 52(3):471–501, 2010

  40. [40]

    A generalized solution of the orthogonal Procrustes problem.Psychome- trika, 31(1):1–10, 1966

    Peter H Schönemann. A generalized solution of the orthogonal Procrustes problem.Psychome- trika, 31(1):1–10, 1966

  41. [41]

    Guaranteed matrix completion via non-convex factorization

    Ruoyu Sun and Zhi-Quan Luo. Guaranteed matrix completion via non-convex factorization. IEEE Transactions on Information Theory, 62(11):6535–6579, 2016

  42. [42]

    Ten Berge

    Jos M.F. Ten Berge. Orthogonal Procrustes rotation for two or more matrices.Psychometrika, 42(2):267–276, 1977

  43. [43]

    Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent.Journal of Machine Learning Research, 22(150):1–63, 2021

    Tian Tong, Cong Ma, and Yuejie Chi. Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent.Journal of Machine Learning Research, 22(150):1–63, 2021

  44. [44]

    An introduction to matrix concentration inequalities.Foundations and Trends in Machine Learning, 8:1–230, 2015

    Joel A Tropp. An introduction to matrix concentration inequalities.Foundations and Trends in Machine Learning, 8:1–230, 2015

  45. [45]

    Low-rank solutions of linear matrix equations via procrustes flow

    Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Ben Recht. Low-rank solutions of linear matrix equations via procrustes flow. InInternational Conference on Machine Learning, pages 964–973. PMLR, 2016. 14

  46. [46]

    Generalized low rank models

    Madeleine Udell, Corinne Horn, Reza Zadeh, and Stephen Boyd. Generalized low rank models. Foundations and Trends in Machine Learning, 9(1):1–118, 2016

  47. [47]

    A least squares estimate of satellite attitude.SIAM review, 7(3):409–409, 1965

    Grace Wahba. A least squares estimate of satellite attitude.SIAM review, 7(3):409–409, 1965

  48. [48]

    A unified computational and statistical framework for nonconvex low-rank matrix estimation

    Lingxiao Wang, Xiao Zhang, and Quanquan Gu. A unified computational and statistical framework for nonconvex low-rank matrix estimation. InArtificial Intelligence and Statistics, pages 981–990. PMLR, 2017

  49. [49]

    Linformer: Self-Attention with Linear Complexity

    Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. Linformer: Self-attention with linear complexity.arXiv preprint arXiv:2006.04768, 2020

  50. [50]

    SVD-LLM: Truncation-aware singular value decomposition for large language model compression

    Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. SVD-LLM: Truncation-aware singular value decomposition for large language model compression. InInternational Conference on Learning Representations, 2025

  51. [51]

    Nyströmformer: A Nyström-based algorithm for approximating self-attention

    Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh. Nyströmformer: A Nyström-based algorithm for approximating self-attention. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14138–14148, 2021

  52. [52]

    Adaptive budget allocation for parameter-efficient fine-tuning

    Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. InInternational Conference on Learning Representations, 2023

  53. [53]

    LoRA-one: One-step full gradient could suffice for fine-tuning large language models, provably and efficiently

    Yuanhe Zhang, Fanghui Liu, and Yudong Chen. LoRA-one: One-step full gradient could suffice for fine-tuning large language models, provably and efficiently. InInternational Conference on Machine Learning, 2025

  54. [54]

    Galore: Memory-efficient LLM training by gradient low-rank projection

    Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, and Yuandong Tian. Galore: Memory-efficient LLM training by gradient low-rank projection. InInternational Conference on Machine Learning, 2024

  55. [55]

    Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent.arXiv preprint arXiv:1605.07051, 2016

    Qinqing Zheng and John Lafferty. Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent.arXiv preprint arXiv:1605.07051, 2016. 15 Contents A Example Details 16 B Additional Assumptions 19 C Proofs of Results under Symmetric Model 20 D Proof of Results under Asymmetric Model 46 E Proof of Examples 77 ...

  56. [56]

    For eachi∈[n], there is a row-wise LOO initializer(U0,−i,V 0,−i), measurable toσ Y −i,

  57. [57]

    Here,( Y −i)kℓ := Ykℓ1{k̸=i} + P ∗ kℓ1{k=i},fori, k∈ [n], ℓ∈ [q], and( Y −ℓ)kj := Ykj1{j̸=ℓ} + P ∗ kj1{j=ℓ},fork∈ [n], ℓ, j∈ [q]

    For eachℓ∈[q], there is a column-wise LOO initializer(U0,−ℓ,V 0,−ℓ), measurable toσ Y ,−ℓ . Here,( Y −i)kℓ := Ykℓ1{k̸=i} + P ∗ kℓ1{k=i},fori, k∈ [n], ℓ∈ [q], and( Y −ℓ)kj := Ykj1{j̸=ℓ} + P ∗ kj1{j=ℓ},fork∈ [n], ℓ, j∈ [q]. Let ϕ† nq := ϕnq + C ∆2(n,q,δ) ασmin and ψ† nq := ψnq + C ∆∞(n,q,δ) ασmin . Assume further that, for the same orthogonal matrixR0 in (1...

  58. [58]

    Using (C.3), we conclude γ1,t ≤ 1− 7 8 ηασmin ∥Et∥F.(C.4) We next boundγ2,t

    Thus all eigenvalues ofInr −η ¯At lie in [0,1), and therefore (Inr −η ¯At)vec(E⊺ t ) ≤ 1−ηλ min( ¯At) ∥Et∥F. Using (C.3), we conclude γ1,t ≤ 1− 7 8 ηασmin ∥Et∥F.(C.4) We next boundγ2,t. By the induction hypothesis, dist2(Zt,Z ∗) =∥E t∥F ≤ϕ n∥Z∗∥F ≤ϕ n √r∥Z ∗∥ ≤c 0∥Z∗∥, where the last step usesϕn ≤c 0α/(κ√r)and α/κ≤ 1. Since dist2(Zt(s),Z ∗) ≤ ∥E t∥F, Lemm...

  59. [59]

    The score at(¯U, ¯V)is bounded by R ¯Z ≤C∆ ∞(n, q, δ)ω∗.(E.1)

  60. [60]

    The auxiliary point is close to(eU, eV)in weighted Frobenius norm: ∥( ¯U− eU, ¯V− eV)∥ 2,F ≤C R ¯Z ασmin .(E.2)

  61. [61]

    (E.3) Proof.See Section F.6

    The quadratic noise terms satisfy ν⋆ ∥(P ∗ −Y)( ¯V−V ∗)∥2→∞ q ∨ ∥(P ∗ −Y) ⊺( ¯U−U ∗)∥2→∞ n ≤C n ν⋆L⋆ s β αnqσmin + ν⋆β ω∗ α(n∧q)σ min + ν2 ⋆ L⋆ ω∗ αnqσmin o ≤C∆ ∞(n, q, δ)ω∗. (E.3) Proof.See Section F.6. With this lemma, we writemaxi∈[n] ∥∆(U) 4,i ∥as max i∈[n] ∥∆(U) 4,i ∥=ν ⋆ (P ∗ −Y)E V 2→∞ ≤ν ⋆ (P ∗ −Y)( ¯V−V ∗) 2→∞ +ν ⋆ P ∗ −Y ¯V− eV F ≤C ∆∞ ασmin ω∗,...