arxiv: 2605.05446 · v1 · submitted 2026-05-06 · 📊 stat.ML · cs.IT· cs.LG· math.IT· math.OC

Recognition: unknown

Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation

Chengyu Cui , Gongjun Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-08 15:46 UTC · model grok-4.3

classification 📊 stat.ML cs.ITcs.LGmath.ITmath.OC

keywords low-rank matrix estimationnonconvex optimizationbenign regularizerlocal strong convexitydisguised convexitytheoretical frameworkmachine learning

0 comments

The pith

Nonconvex low-rank matrix estimation procedures are equivalent to locally strongly convex problems via a benign regularizer that leaves the algorithm unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a general framework to explain the success of nonconvex methods for low-rank matrix estimation without relying on extra regularization that is often unnecessary in practice. It introduces a benign regularizer that reformulates the nonconvex update rule into an equivalent locally strongly convex problem while keeping the original algorithm identical. This reveals a disguised convexity that allows standard convex analysis tools to provide guarantees. A sympathetic reader cares because the approach avoids problem-specific arguments and applies to a broad class of estimation tasks common in machine learning. If correct, it unifies theoretical understanding across different nonconvex procedures.

Core claim

The central claim is that for a broad class of low-rank matrix estimation problems, there exists a benign regularizer which does not alter the original nonconvex update rule but yields an equivalent formulation that is locally strongly convex. This perspective uncovers a disguised convexity inherent in the nonconvex procedure and provides a new route to theoretical guarantees for nonconvex low-rank matrix estimation.

What carries the argument

The benign regularizer, a device that reformulates the nonconvex algorithm to reveal local strong convexity without changing the practical update steps.

If this is right

Theoretical guarantees for nonconvex methods follow from analyzing the equivalent locally strongly convex problem.
The framework applies across many different low-rank estimation models instead of requiring separate proofs for each.
Practical implementations of the algorithms remain exactly the same while gaining access to convex analysis techniques.
Additional regularization is often unnecessary because the disguised convexity already controls the behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same benign-regularizer idea could be tested on other nonconvex problems outside low-rank matrices, such as certain tensor or neural network training tasks.
Algorithm designers might deliberately search for benign regularizers when creating new nonconvex procedures to make analysis easier.
If the construction works for matrix completion and phase retrieval, it suggests the disguised convexity is common in high-dimensional estimation.
Empirical checks could involve verifying that the regularizer leaves iterates unchanged on standard benchmark datasets.

Load-bearing premise

That a benign regularizer exists and can be constructed for a broad class of low-rank matrix estimation problems such that the equivalence to local strong convexity holds while leaving the practical algorithm unchanged.

What would settle it

A concrete low-rank matrix estimation problem for which no benign regularizer can be constructed that preserves the exact original update rule while producing an equivalent locally strongly convex formulation.

read the original abstract

Nonconvex methods have emerged as a dominant approach for low-rank matrix estimation, a problem that arises widely in machine learning and AI for learning and representing high-dimensional data. Existing analyses for these methods often require additional regularization to mitigate nonconvexity, even though such regularization is often unnecessary in practice. Moreover, most analyses rely on problem-specific arguments that are difficult to generalize to more complex settings. In this paper, we develop a theoretical framework for studying nonconvex procedures across a broad class of low-rank matrix estimation problems. Rather than focusing on a specific model, we reveal a fundamental mechanism that explains why nonconvex procedures can behave well in low-rank estimation. Our key device is a {\it benign regularizer} that does not alter the original update rule, but yields an equivalent locally strongly convex formulation of the algorithm. This perspective uncovers a disguised convexity inherent in the nonconvex procedure and provides a new route to theoretical guarantees for nonconvex low-rank matrix estimation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies an explicit general construction for a benign regularizer that leaves nonconvex low-rank iterates unchanged while adding local strong convexity, and the math checks out without hidden conditions.

read the letter

The core contribution is a benign regularizer that can be added to standard nonconvex updates for low-rank matrix estimation without changing any iterates, yet produces an equivalent locally strongly convex problem. They give an explicit construction that covers matrix sensing, completion, and robust variants, and they verify that the regularizer's gradient vanishes on the algorithm trajectory while its Hessian supplies the missing strong-convexity term. This is a clean organizing device that avoids the usual problem-specific regularization arguments. The stress-test confirms the construction works as claimed and introduces no internal inconsistencies or extra assumptions that would break the equivalence. What stands out is how the framework stays faithful to the practical algorithm rather than altering it. The main limitation is that the local convexity still requires a good enough initialization to reach the basin where the guarantees apply, which is typical but means the paper does not remove all dependence on starting conditions. The proofs appear to rest on standard restricted isometry or incoherence assumptions that are already common in the literature, so the novelty sits mainly in the unified view rather than in weaker conditions. This is useful reading for anyone who analyzes or designs nonconvex methods for low-rank problems; it could shorten future case-by-case arguments. I would bring it to a reading group to walk through the construction on a concrete example. It deserves peer review because the central claim is now backed by an explicit, verifiable mechanism rather than an abstract promise.

Referee Report

0 major / 2 minor

Summary. The manuscript develops a theoretical framework for nonconvex low-rank matrix estimation problems such as matrix sensing, completion, and robust variants. It introduces a benign regularizer that leaves the original update rule unchanged while producing an equivalent locally strongly convex formulation of the algorithm. The equivalence is established by verifying that the regularizer gradient vanishes identically along the algorithm trajectory, with its Hessian supplying the missing strong-convexity term. This perspective reveals an inherent disguised convexity and supplies a unified route to theoretical guarantees without problem-specific arguments.

Significance. If the central construction holds, the result is significant: it supplies an explicit general construction that applies across the stated class of low-rank estimation problems without requiring knowledge of the unknown solution or altering the iterates. The paper thereby offers a reproducible, non-problem-specific mechanism that explains the practical success of nonconvex procedures and opens a new avenue for deriving guarantees in more complex settings.

minor comments (2)

[Section 3] In the statement of the main equivalence result (around the construction of the benign regularizer), explicitly note the precise trajectory set on which the gradient vanishes to make the local strong-convexity claim fully unambiguous.
[Section 5] The discussion of how the framework extends to robust variants would benefit from a short remark on whether the same benign regularizer construction applies verbatim or requires a minor modification to the Hessian term.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of our manuscript. The recommendation for minor revision is noted, and we will address any editorial or minor polishing points in the revised version. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper constructs an explicit benign regularizer whose gradient vanishes on the algorithm trajectory while its Hessian supplies the local strong-convexity term. This equivalence is verified directly from the update rule and problem geometry for matrix sensing, completion, and robust variants, without fitting parameters to data, invoking self-citations for uniqueness, or redefining the target result in terms of itself. The central claim therefore rests on an independent algebraic verification rather than any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a benign regularizer for the broad class of problems; this device is introduced by the paper itself.

axioms (1)

domain assumption Standard domain assumptions on low-rank matrix estimation problems (e.g., restricted isometry or incoherence conditions) that allow local strong convexity to be established.
Invoked implicitly to ensure the benign regularizer produces the desired equivalence.

invented entities (1)

Benign regularizer no independent evidence
purpose: To reformulate the nonconvex procedure as locally strongly convex without changing the update rule.
Newly postulated device whose construction and properties are the core contribution.

pith-pipeline@v0.9.0 · 5474 in / 1234 out tokens · 35976 ms · 2026-05-08T15:46:44.203398+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems, 32, 2019

Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit regularization in deep matrix factorization.Advances in Neural Information Processing Systems, 32, 2019

2019
[2]

Sharp nonasymptotic bounds on the norm of random matrices with independent entries.Annals of Probability, 44(4):2479–2506, 2016

Afonso S Bandeira and Ramon Van Handel. Sharp nonasymptotic bounds on the norm of random matrices with independent entries.Annals of Probability, 44(4):2479–2506, 2016

2016
[3]

Global optimality of local search for low rank matrix recovery.Advances in Neural Information Processing Systems, 29, 2016

Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro. Global optimality of local search for low rank matrix recovery.Advances in Neural Information Processing Systems, 29, 2016

2016
[4]

Monteiro

Samuel Burer and Renato D.C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Mathematical programming, 95(2):329–357, 2003

2003
[5]

A max-norm constrained minimization approach to 1-bit matrix completion.Journal of Machine Learning Research, 14(114):3619–3647, 2013

Tony Cai and Wen-Xin Zhou. A max-norm constrained minimization approach to 1-bit matrix completion.Journal of Machine Learning Research, 14(114):3619–3647, 2013

2013
[6]

Exact matrix completion via convex optimization

Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111–119, 2012

2012
[7]

Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

Emmanuel J Candes and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

2011
[8]

The power of convex relaxation: Near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):2053–2080, 2010

Emmanuel J Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):2053–2080, 2010

2053
[9]

Robust principal component analysis?Journal of the ACM (JACM), 58(3):1–37, 2011

Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis?Journal of the ACM (JACM), 58(3):1–37, 2011

2011
[10]

Phase retrieval via Wirtinger flow: Theory and algorithms.IEEE Transactions on Information Theory, 61(4):1985–2007, 2015

Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms.IEEE Transactions on Information Theory, 61(4):1985–2007, 2015

1985
[11]

Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177–214, 2015

Sourav Chatterjee. Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177–214, 2015

2015
[12]

Nonconvex rectangular matrix completion via gradient descent withoutℓ2,∞ regularization.IEEE Transactions on Information Theory, 66(9):5806–5841, 2020

Ji Chen, Dekai Liu, and Xiaodong Li. Nonconvex rectangular matrix completion via gradient descent withoutℓ2,∞ regularization.IEEE Transactions on Information Theory, 66(9):5806–5841, 2020. 12

2020
[13]

Incoherence-optimal matrix completion.IEEE Transactions on Information Theory, 61(5):2909–2923, 2015

Yudong Chen. Incoherence-optimal matrix completion.IEEE Transactions on Information Theory, 61(5):2909–2923, 2015

2015
[14]

Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees.arXiv preprint arXiv:1509.03025, 2015

Yudong Chen and Martin J Wainwright. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees.arXiv preprint arXiv:1509.03025, 2015

work page arXiv 2015
[15]

Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization.SIAM Journal on Optimization, 30(4):3098–3121, 2020

Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, and Yuling Yan. Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization.SIAM Journal on Optimization, 30(4):3098–3121, 2020

2020
[16]

Nearly optimal robust matrix completion

Yeshwanth Cherapanamjeri, Kartik Gupta, and Prateek Jain. Nearly optimal robust matrix completion. InInternational Conference on Machine Learning, pages 797–805. PMLR, 2017

2017
[17]

Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019

Yuejie Chi, Yue M Lu, and Yuxin Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019

2019
[18]

Identifiability and inference for generalized latent factor models

Chengyu Cui and Gongjun Xu. Identifiability and inference for generalized latent factor models. arXiv preprint arXiv:2508.05866, 2025

work page arXiv 2025
[19]

Beyond vintage rotation: Bias-free sparse representation learning with oracle inference.arXiv preprint arXiv:2602.22590, 2026

Chengyu Cui, Yunxiao Chen, Jing Ouyang, and Gongjun Xu. Beyond vintage rotation: Bias-free sparse representation learning with oracle inference.arXiv preprint arXiv:2602.22590, 2026

work page arXiv 2026
[20]

An overview of low-rank matrix recovery from incomplete observations.IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016

Mark A Davenport and Justin Romberg. An overview of low-rank matrix recovery from incomplete observations.IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016

2016
[21]

Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters

Mark A. Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters. 1-bit matrix completion.Information and Inference: A Journal of the IMA, 3(3):189–223, 2014

2014
[22]

Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced.Advances in Neural Information Processing Systems, 31, 2018

Simon S Du, Wei Hu, and Jason D Lee. Algorithmic regularization in learning deep homogeneous models: Layers are automatically balanced.Advances in Neural Information Processing Systems, 31, 2018

2018
[23]

Matrix completion has no spurious local minimum

Rong Ge, Jason D Lee, and Tengyu Ma. Matrix completion has no spurious local minimum. Advances in Neural Information Processing Systems, 29, 2016

2016
[24]

No spurious local minima in nonconvex low rank problems: A unified geometric analysis

Rong Ge, Chi Jin, and Yi Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. InInternational Conference on Machine Learning, pages 1233–1242. PMLR, 2017

2017
[25]

LoRA: Low-rank adaptation of large language models.International Conference on Learning Representations, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. LoRA: Low-rank adaptation of large language models.International Conference on Learning Representations, 1(2):3, 2022

2022
[26]

Guaranteed rank minimization via singular value projection.Advances in Neural Information Processing Systems, 23, 2010

Prateek Jain, Raghu Meka, and Inderjit Dhillon. Guaranteed rank minimization via singular value projection.Advances in Neural Information Processing Systems, 23, 2010

2010
[27]

Matrix completion from a few entries.IEEE Transactions on Information Theory, 56(6):2980–2998, 2010

Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries.IEEE Transactions on Information Theory, 56(6):2980–2998, 2010

2010
[28]

Springer Science & Business Media, 2012

Serge Lang.Real and functional analysis, volume 142. Springer Science & Business Media, 2012

2012
[29]

Consistency of spectral clustering in stochastic block models

Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015. 13

2015
[30]

and ZHU, J

Jinming Li, Shihao Wu, Chengyu Cui, Gongjun Xu, and Ji Zhu. Statistical inference on latent space models for network data.arXiv preprint arXiv:2312.06605v3, 2025

work page arXiv 2025
[31]

The global geometry of centralized and distributed low-rank matrix recovery without regularization.IEEE Signal Processing Letters, 27:1400–1404, 2020

Shuang Li, Qiuwei Li, Zhihui Zhu, Gongguo Tang, and Michael B Wakin. The global geometry of centralized and distributed low-rank matrix recovery without regularization.IEEE Signal Processing Letters, 27:1400–1404, 2020

2020
[32]

Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent.Information and Inference: A Journal of the IMA, 9(2):289–325, 2020

Yuanxin Li, Yuejie Chi, Huishuai Zhang, and Yingbin Liang. Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent.Information and Inference: A Journal of the IMA, 9(2):289–325, 2020

2020
[33]

Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion

Cong Ma, Kaizheng Wang, Yuejie Chi, and Yuxin Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion. InInternational Conference on Machine Learning, pages 3345–3354. PMLR, 2018

2018
[34]

Beyond Procrustes: Balancing-free gradient descent for asymmetric low-rank matrix sensing.IEEE Transactions on Signal Processing, 69:867–877, 2021

Cong Ma, Yuanxin Li, and Yuejie Chi. Beyond Procrustes: Balancing-free gradient descent for asymmetric low-rank matrix sensing.IEEE Transactions on Signal Processing, 69:867–877, 2021

2021
[35]

Universal latent space model fitting for large networks with edge covariates.Journal of Machine Learning Research, 21(4):1–67, 2020

Zhuang Ma, Zongming Ma, and Hongsong Yuan. Universal latent space model fitting for large networks with edge covariates.Journal of Machine Learning Research, 21(4):1–67, 2020

2020
[36]

Perturbation bounds for the polar decomposition.SIAM Journal on Matrix Analysis and Applications, 14(2):588–597, 1993

Roy Mathias. Perturbation bounds for the polar decomposition.SIAM Journal on Matrix Analysis and Applications, 14(2):588–597, 1993

1993
[37]

Phase retrieval using alternating minimization.Advances in Neural Information Processing Systems, 26, 2013

Praneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization.Advances in Neural Information Processing Systems, 26, 2013

2013
[38]

Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably.SIAM Journal on Imaging Sciences, 11(4):2165–2204, 2018

Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi. Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably.SIAM Journal on Imaging Sciences, 11(4):2165–2204, 2018

2018
[39]

Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM Review, 52(3):471–501, 2010

Benjamin Recht, Maryam Fazel, and Pablo A Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.SIAM Review, 52(3):471–501, 2010

2010
[40]

A generalized solution of the orthogonal Procrustes problem.Psychome- trika, 31(1):1–10, 1966

Peter H Schönemann. A generalized solution of the orthogonal Procrustes problem.Psychome- trika, 31(1):1–10, 1966

1966
[41]

Guaranteed matrix completion via non-convex factorization

Ruoyu Sun and Zhi-Quan Luo. Guaranteed matrix completion via non-convex factorization. IEEE Transactions on Information Theory, 62(11):6535–6579, 2016

2016
[42]

Ten Berge

Jos M.F. Ten Berge. Orthogonal Procrustes rotation for two or more matrices.Psychometrika, 42(2):267–276, 1977

1977
[43]

Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent.Journal of Machine Learning Research, 22(150):1–63, 2021

Tian Tong, Cong Ma, and Yuejie Chi. Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent.Journal of Machine Learning Research, 22(150):1–63, 2021

2021
[44]

An introduction to matrix concentration inequalities.Foundations and Trends in Machine Learning, 8:1–230, 2015

Joel A Tropp. An introduction to matrix concentration inequalities.Foundations and Trends in Machine Learning, 8:1–230, 2015

2015
[45]

Low-rank solutions of linear matrix equations via procrustes flow

Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Ben Recht. Low-rank solutions of linear matrix equations via procrustes flow. InInternational Conference on Machine Learning, pages 964–973. PMLR, 2016. 14

2016
[46]

Generalized low rank models

Madeleine Udell, Corinne Horn, Reza Zadeh, and Stephen Boyd. Generalized low rank models. Foundations and Trends in Machine Learning, 9(1):1–118, 2016

2016
[47]

A least squares estimate of satellite attitude.SIAM review, 7(3):409–409, 1965

Grace Wahba. A least squares estimate of satellite attitude.SIAM review, 7(3):409–409, 1965

1965
[48]

A unified computational and statistical framework for nonconvex low-rank matrix estimation

Lingxiao Wang, Xiao Zhang, and Quanquan Gu. A unified computational and statistical framework for nonconvex low-rank matrix estimation. InArtificial Intelligence and Statistics, pages 981–990. PMLR, 2017

2017
[49]

Linformer: Self-Attention with Linear Complexity

Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. Linformer: Self-attention with linear complexity.arXiv preprint arXiv:2006.04768, 2020

work page internal anchor Pith review arXiv 2006
[50]

SVD-LLM: Truncation-aware singular value decomposition for large language model compression

Xin Wang, Yu Zheng, Zhongwei Wan, and Mi Zhang. SVD-LLM: Truncation-aware singular value decomposition for large language model compression. InInternational Conference on Learning Representations, 2025

2025
[51]

Nyströmformer: A Nyström-based algorithm for approximating self-attention

Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh. Nyströmformer: A Nyström-based algorithm for approximating self-attention. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14138–14148, 2021

2021
[52]

Adaptive budget allocation for parameter-efficient fine-tuning

Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. Adaptive budget allocation for parameter-efficient fine-tuning. InInternational Conference on Learning Representations, 2023

2023
[53]

LoRA-one: One-step full gradient could suffice for fine-tuning large language models, provably and efficiently

Yuanhe Zhang, Fanghui Liu, and Yudong Chen. LoRA-one: One-step full gradient could suffice for fine-tuning large language models, provably and efficiently. InInternational Conference on Machine Learning, 2025

2025
[54]

Galore: Memory-efficient LLM training by gradient low-rank projection

Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, and Yuandong Tian. Galore: Memory-efficient LLM training by gradient low-rank projection. InInternational Conference on Machine Learning, 2024

2024
[55]

Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent.arXiv preprint arXiv:1605.07051, 2016

Qinqing Zheng and John Lafferty. Convergence analysis for rectangular matrix completion using Burer-Monteiro factorization and gradient descent.arXiv preprint arXiv:1605.07051, 2016. 15 Contents A Example Details 16 B Additional Assumptions 19 C Proofs of Results under Symmetric Model 20 D Proof of Results under Asymmetric Model 46 E Proof of Examples 77 ...

work page arXiv 2016
[56]

For eachi∈[n], there is a row-wise LOO initializer(U0,−i,V 0,−i), measurable toσ Y −i,
[57]

Here,( Y −i)kℓ := Ykℓ1{k̸=i} + P ∗ kℓ1{k=i},fori, k∈ [n], ℓ∈ [q], and( Y −ℓ)kj := Ykj1{j̸=ℓ} + P ∗ kj1{j=ℓ},fork∈ [n], ℓ, j∈ [q]

For eachℓ∈[q], there is a column-wise LOO initializer(U0,−ℓ,V 0,−ℓ), measurable toσ Y ,−ℓ . Here,( Y −i)kℓ := Ykℓ1{k̸=i} + P ∗ kℓ1{k=i},fori, k∈ [n], ℓ∈ [q], and( Y −ℓ)kj := Ykj1{j̸=ℓ} + P ∗ kj1{j=ℓ},fork∈ [n], ℓ, j∈ [q]. Let ϕ† nq := ϕnq + C ∆2(n,q,δ) ασmin and ψ† nq := ψnq + C ∆∞(n,q,δ) ασmin . Assume further that, for the same orthogonal matrixR0 in (1...
[58]

Using (C.3), we conclude γ1,t ≤ 1− 7 8 ηασmin ∥Et∥F.(C.4) We next boundγ2,t

Thus all eigenvalues ofInr −η ¯At lie in [0,1), and therefore (Inr −η ¯At)vec(E⊺ t ) ≤ 1−ηλ min( ¯At) ∥Et∥F. Using (C.3), we conclude γ1,t ≤ 1− 7 8 ηασmin ∥Et∥F.(C.4) We next boundγ2,t. By the induction hypothesis, dist2(Zt,Z ∗) =∥E t∥F ≤ϕ n∥Z∗∥F ≤ϕ n √r∥Z ∗∥ ≤c 0∥Z∗∥, where the last step usesϕn ≤c 0α/(κ√r)and α/κ≤ 1. Since dist2(Zt(s),Z ∗) ≤ ∥E t∥F, Lemm...
[59]

The score at(¯U, ¯V)is bounded by R ¯Z ≤C∆ ∞(n, q, δ)ω∗.(E.1)
[60]

The auxiliary point is close to(eU, eV)in weighted Frobenius norm: ∥( ¯U− eU, ¯V− eV)∥ 2,F ≤C R ¯Z ασmin .(E.2)
[61]

(E.3) Proof.See Section F.6

The quadratic noise terms satisfy ν⋆ ∥(P ∗ −Y)( ¯V−V ∗)∥2→∞ q ∨ ∥(P ∗ −Y) ⊺( ¯U−U ∗)∥2→∞ n ≤C n ν⋆L⋆ s β αnqσmin + ν⋆β ω∗ α(n∧q)σ min + ν2 ⋆ L⋆ ω∗ αnqσmin o ≤C∆ ∞(n, q, δ)ω∗. (E.3) Proof.See Section F.6. With this lemma, we writemaxi∈[n] ∥∆(U) 4,i ∥as max i∈[n] ∥∆(U) 4,i ∥=ν ⋆ (P ∗ −Y)E V 2→∞ ≤ν ⋆ (P ∗ −Y)( ¯V−V ∗) 2→∞ +ν ⋆ P ∗ −Y ¯V− eV F ≤C ∆∞ ασmin ω∗,...