Sharp recovery and landscape guarantees for the nonconvex matrix LASSO

Andrew D. McRae; Richard Y. Zhang

arxiv: 2604.19381 · v1 · submitted 2026-04-21 · 🧮 math.OC · math.ST· stat.TH

Sharp recovery and landscape guarantees for the nonconvex matrix LASSO

Andrew D. McRae , Richard Y. Zhang This is my paper

Pith reviewed 2026-05-10 02:07 UTC · model grok-4.3

classification 🧮 math.OC math.STstat.TH

keywords nonconvex optimizationmatrix recoveryrestricted isometry propertynuclear norm regularizationlow-rank matricessecond-order critical pointsoverparametrization

0 comments

The pith

Second-order critical points of the factored nonconvex matrix LASSO recover low-rank signals at rates that interpolate between convex and unregularized nonconvex bounds under RIP.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops sharp recovery guarantees for second-order critical points of the factored nonconvex formulation of the matrix LASSO, with nuclear-norm regularization. These bounds are statistically optimal and hold in the overparametrized regime where the factorization rank exceeds the ground truth rank. The results show how the regularization strength controls the error, bridging the classical convex rate and faster unregularized nonconvex rates. The analysis also produces counterexamples where overparametrization fails to improve the landscape despite RIP. All claims extend to general convex losses under restricted strong convexity and smoothness.

Core claim

We develop a sharp and statistically optimal theory for second-order critical points of the factored nonconvex matrix LASSO under RIP with particular emphasis on the overparametrized regime where the search rank r exceeds the ground-truth rank r*. Our recovery error bounds reveal the precise role of nuclear norm regularization, interpolating between the classical convex rate and known rates for the unregularized nonconvex problem. Complementing this positive result, we give examples showing that rank overparametrization does not always improve the optimization landscape even under RIP. All of our results generalize to arbitrary convex functions with nuclear-norm regularization under RSC and,

What carries the argument

The factored nonconvex formulation of the nuclear-norm-regularized least-squares estimator, analyzed at its second-order critical points under the restricted isometry property.

Load-bearing premise

The measurement operator satisfies the restricted isometry property at the relevant rank together with restricted strong convexity and smoothness for the convex loss.

What would settle it

A concrete low-rank matrix recovery instance where a second-order critical point of the nonconvex LASSO exceeds the interpolated error bound while the RIP constant remains below the threshold required by the theory.

Figures

Figures reproduced from arXiv: 2604.19381 by Andrew D. McRae, Richard Y. Zhang.

**Figure 1.** Figure 1: Average error of ∥UV ⊤ − M∗∥F for local optimization of (4) with Gaussian measurements, different values of the search rank parameter r, and random initialization of (U, V ). We used d1 = 50, d2 = 51, r∗ = 2, and n = ⌈2.35r∗(d1 +d2)⌉ = 475 Gaussian measurements. All nonzero singular values of M∗ are 1. We chose ξ = 0 and λ = 0.0001 (nonzero to ensure a unique global optimum), and A is scaled such that E∥A(… view at source ↗

read the original abstract

Low-rank matrix recovery can be solved to statistical optimality by convex matrix optimization under the classical assumption of restricted isometry property (RIP). However, for large problems, the convex formulation is commonly replaced by a smooth rank-constrained factored nonconvex problem for which algorithmic theory typically only guarantees convergence to second-order critical points. In this paper, we develop a sharp and statistically optimal theory for second-order critical points of the factored nonconvex matrix LASSO (nuclear-norm--regularized least-squares estimator) under RIP with particular emphasis on the overparametrized regime where the search rank $r$ exceeds the ground-truth rank $r_*$. Our recovery error bounds reveal the precise role of nuclear norm regularization, interpolating between the classical convex rate and known rates for the unregularized nonconvex problem. Complementing this positive result, we give examples showing that, contrary to popular belief, rank overparametrization does not always improve the optimization landscape even under RIP. This negative result raises questions about the fundamental statistical recovery capability of rank-constrained nonconvex approaches in comparison to convex approaches which have worse computational scaling. All of our results generalize to arbitrary convex functions with nuclear-norm regularization under restricted strong convexity and smoothness. In particular, we give sharp conditions under which second-order critical points of the nonconvex problem either (1) approximately recover low-rank approximate minima of the convex problem or (2) exactly recover a low-rank global optimum if one exists.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives sharp interpolation bounds for second-order critical points of the factored nonconvex matrix LASSO under RIP and supplies counterexamples showing overparametrization does not always improve the landscape.

read the letter

The main thing to know is that this work pins down how nuclear-norm regularization affects recovery error at second-order critical points in the overparametrized regime, producing bounds that sit between the convex matrix LASSO rate and the unregularized nonconvex rate. It also shows that increasing the search rank past the true rank does not reliably smooth the landscape even when RIP holds. Both results extend to general convex losses under restricted strong convexity and smoothness, with explicit conditions for when the nonconvex points recover convex minima or exact global optima. That interpolation and the negative examples on overparametrization are the genuinely new pieces; prior work had separate analyses for the convex and unregularized cases but not this controlled comparison inside the same factored formulation. The generalization beyond least squares is also cleanly stated. The RIP assumption at rank roughly r plus r-star carries the load, as the stress-test note flags, and the abstract claims the proofs avoid extra factors that would grow with the overparametrization ratio. Without the full derivations it is hard to confirm the constants stay tight, but the stated claims do not show obvious circularity or post-hoc fitting. The counterexamples are asserted rather than constructed in the abstract, so their explicitness will matter for readers who want to check tightness. This is written for people who already work on low-rank recovery and nonconvex optimization in signal processing or statistical estimation. Anyone deciding between convex and factored formulations for large problems will find the comparison useful. The technical specificity and the mix of positive and negative results are enough to justify sending it to a serious referee rather than desk-rejecting it.

Referee Report

3 major / 2 minor

Summary. The manuscript develops sharp recovery guarantees for second-order critical points of the factored nonconvex matrix LASSO (nuclear-norm regularized least squares) under the restricted isometry property (RIP) of the measurement operator, with emphasis on the overparametrized regime where the search rank r exceeds the ground-truth rank r_*. The central claims are that the resulting error bounds are statistically optimal and interpolate between the classical convex matrix LASSO rate and known unregularized nonconvex rates; that rank overparametrization does not always improve the landscape even under RIP (with counterexamples); and that the results extend to arbitrary convex losses under restricted strong convexity/smoothness, yielding sharp conditions under which second-order critical points either approximately recover low-rank approximate minima of the convex problem or exactly recover a low-rank global optimum if one exists.

Significance. If the claims hold, the work supplies a precise, statistically optimal characterization of how nuclear-norm regularization affects both recovery error and the optimization landscape in factored nonconvex formulations. The interpolation result and the negative landscape examples are notable contributions that clarify the trade-offs between convex and nonconvex approaches in the overparametrized regime. The generalization to general convex losses under RSC/RSS broadens the scope. The paper provides explicit conditions and counterexamples, which are strengths when the derivations are tight.

major comments (3)

[Main recovery theorem] Main recovery theorem (likely §3 or Theorem 3.1): the claimed sharp interpolation of recovery bounds without extra factors that grow with the overparametrization ratio r/r_* requires that RIP at rank roughly r + r_* directly produces restricted strong convexity/smoothness constants independent of r/r_*. The derivation must explicitly rule out degradation of these constants in the overparametrized regime; otherwise the bounds are not uniformly sharp as stated.
[Negative landscape examples] Negative landscape examples (likely §4 or §5): the construction details for the counterexamples showing that overparametrization does not always improve the landscape under RIP are asserted but lack sufficient explicit construction (e.g., specific measurement operators or loss functions) to verify that they satisfy RIP while producing the claimed bad landscape properties; this is load-bearing for the complementing negative result.
[Generalization section] Generalization to arbitrary convex losses (likely §6): the sharp conditions under which second-order critical points recover low-rank approximate minima of the convex problem or exact global optima must be checked for hidden dependencies on r/r_* that would undermine the interpolation claim in the overparametrized regime.

minor comments (2)

[Abstract] Abstract: the phrase 'all of our results generalize' would benefit from a brief parenthetical specifying the precise class of convex functions (beyond 'arbitrary convex functions with nuclear-norm regularization').
[Notation] Notation and assumptions: clarify the exact rank at which RIP is assumed (e.g., 2r or r + r_*) and ensure the definition of 'relevant rank' is stated uniformly before the main theorems.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of the contributions, and constructive suggestions. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Main recovery theorem] Main recovery theorem (likely §3 or Theorem 3.1): the claimed sharp interpolation of recovery bounds without extra factors that grow with the overparametrization ratio r/r_* requires that RIP at rank roughly r + r_* directly produces restricted strong convexity/smoothness constants independent of r/r_*. The derivation must explicitly rule out degradation of these constants in the overparametrized regime; otherwise the bounds are not uniformly sharp as stated.

Authors: We agree that an explicit statement ruling out degradation is helpful for clarity. In the proof of Theorem 3.1, the RIP is invoked at rank r + r_* on the difference between the factored variables and the ground truth; the resulting restricted strong convexity and smoothness constants depend solely on the RIP constant δ_{r+r_*} and are therefore independent of the ratio r/r_*. Overparametrization affects only the search space dimension, not the isometry constants themselves. We will add a dedicated remark immediately after Theorem 3.1 and expand the proof paragraph to highlight this independence explicitly. revision: yes
Referee: [Negative landscape examples] Negative landscape examples (likely §4 or §5): the construction details for the counterexamples showing that overparametrization does not always improve the landscape under RIP are asserted but lack sufficient explicit construction (e.g., specific measurement operators or loss functions) to verify that they satisfy RIP while producing the claimed bad landscape properties; this is load-bearing for the complementing negative result.

Authors: We acknowledge that the counterexamples would be easier to verify with more concrete details. In the revised version we will supply an explicit construction: a rank-1 measurement operator A whose RIP constant satisfies δ_2 < 1/3, together with a quadratic loss whose factored formulation at search rank r = 2r_* admits a second-order critical point whose recovery error is bounded away from zero, while the corresponding convex nuclear-norm problem recovers to the optimal rate. This will be placed in a new subsection with all parameters specified so that both the RIP satisfaction and the landscape failure can be checked directly. revision: yes
Referee: [Generalization section] Generalization to arbitrary convex losses (likely §6): the sharp conditions under which second-order critical points recover low-rank approximate minima of the convex problem or exact global optima must be checked for hidden dependencies on r/r_* that would undermine the interpolation claim in the overparametrized regime.

Authors: The generalization in Section 6 assumes restricted strong convexity and smoothness at rank r + r_*, exactly as in the RIP case. Consequently the error bounds and the conditions for approximate or exact recovery remain free of extra factors that grow with r/r_*. We will insert a short paragraph after the main generalization theorem that states this independence explicitly and sketches why the RSC/RSS constants do not degrade with the overparametrization ratio, thereby preserving the interpolation property. revision: partial

Circularity Check

0 steps flagged

No circularity: results derived from external RIP and RSC/RSS assumptions

full rationale

The paper conditions all recovery bounds, landscape guarantees, and interpolation claims on the external assumption that the measurement operator satisfies RIP at rank roughly r + r_* together with restricted strong convexity/smoothness of the loss. These are standard, independently verifiable conditions in the literature and are not derived from or equivalent to the paper's conclusions. No self-citation is used to justify uniqueness or to smuggle in an ansatz; the negative landscape examples are constructed explicitly under RIP to show that overparametrization does not automatically improve the landscape. The derivation chain therefore remains self-contained against the stated assumptions rather than reducing to a fit, renaming, or self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the restricted isometry property for the measurement operator and on restricted strong convexity plus smoothness of the loss; these are standard domain assumptions imported from prior convex recovery literature rather than new postulates.

axioms (2)

domain assumption Restricted isometry property (RIP) of the measurement operator at the relevant rank
Invoked to obtain statistically optimal recovery error bounds for both convex and nonconvex formulations.
domain assumption Restricted strong convexity and smoothness of the convex loss
Used to generalize the results from least-squares to arbitrary convex functions with nuclear-norm regularization.

pith-pipeline@v0.9.0 · 5562 in / 1489 out tokens · 44750 ms · 2026-05-10T02:07:11.853040+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

An overview of low-rank matrix recovery from incomplete observations,

M. A. Davenport and J. Romberg, “An overview of low-rank matrix recovery from incomplete observations,”IEEE J. Sel. Topics Signal Process., vol. 10, no. 4, pp. 608–622, 2016

work page 2016
[2]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” inProc. Int. Conf. Learn. Representations (ICLR), (Virtual conference), Apr. 2022

work page 2022
[3]

Fazel,Matrix rank minimization with applications

M. Fazel,Matrix rank minimization with applications. PhD thesis, Stanford University, 2002

work page 2002
[4]

Guaranteed minimum-rank solutions of linear matrix equa- tions via nuclear norm minimization,

B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equa- tions via nuclear norm minimization,”SIAM Rev., vol. 52, no. 3, pp. 471–501, 2010

work page 2010
[5]

Fixed point and Bregman iterative methods for matrix rank minimization,

S. Ma, D. Goldfarb, and L. Chen, “Fixed point and Bregman iterative methods for matrix rank minimization,”Math. Program., vol. 128, pp. 321–353, 2011

work page 2011
[6]

Estimation of high-dimensional low-rank matrices,

A. Rohde and A. B. Tsybakov, “Estimation of high-dimensional low-rank matrices,”Ann. Stat., vol. 39, no. 2, pp. 887–930, 2011

work page 2011
[7]

Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements,

E. J. Cand` es and Y. Plan, “Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements,”IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 2342–2359, 2011

work page 2011
[8]

Estimation of (near) low-rank matrices with noise and high- dimensional scaling,

S. Negahban and M. J. Wainwright, “Estimation of (near) low-rank matrices with noise and high- dimensional scaling,”Ann. Stat., vol. 39, no. 2, pp. 1069–1097, 2011

work page 2011
[9]

Low rank matrix completion with exponential family noise,

J. Lafond, “Low rank matrix completion with exponential family noise,” inProc. Conf. Learn. Theory (COLT), (Paris), pp. 1224–1243, July 2015

work page 2015
[10]

Complexity bounds for second-order optimality in unconstrained optimization,

C. Cartis, N. I. M. Gould, and P. L. Toint, “Complexity bounds for second-order optimality in unconstrained optimization,”J. Complexity, vol. 28, no. 1, pp. 93–108, 2012

work page 2012
[11]

First-order methods almost always avoid strict saddle points,

J. D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. I. Jordan, and B. Recht, “First-order methods almost always avoid strict saddle points,”Math. Program., vol. 176, pp. 311–337, 2019

work page 2019
[12]

Implicit regular- ization in matrix factorization,

S. Gunasekar, B. E. Woodworth, S. Bhojanapalli, B. Neyshabur, and N. Srebro, “Implicit regular- ization in matrix factorization,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 30, (Long Beach, CA, United States), Dec. 2017

work page 2017
[13]

Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations,

Y. Li, T. Ma, and H. Zhang, “Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations,” inProc. Conf. Learn. Theory (COLT), vol. 75, (Stockholm, Sweden), pp. 2–47, July 2018

work page 2018
[14]

Sharp global guarantees for nonconvex low-rank recovery in the noisy overparame- terized regime,

R. Y. Zhang, “Sharp global guarantees for nonconvex low-rank recovery in the noisy overparame- terized regime,”SIAM J. Optim., vol. 35, no. 3, pp. 2128–2154, 2025

work page 2025
[15]

Improved global landscape guarantees for low-rank factorization in synchronization,

S. Ling, “Improved global landscape guarantees for low-rank factorization in synchronization,” 2026

work page 2026
[16]

Phase retrieval and matrix sensing via benign and overparametrized nonconvex optimization,

A. D. McRae, “Phase retrieval and matrix sensing via benign and overparametrized nonconvex optimization,”IEEE Trans. Inf. Theory, 2026

work page 2026
[17]

Low solution rank of the matrix LASSO under RIP with consequences for rank- constrained algorithms,

A. D. McRae, “Low solution rank of the matrix LASSO under RIP with consequences for rank- constrained algorithms,”Math. Program., vol. 215, pp. 717–741, 2026

work page 2026
[18]

A high-dimensional statistical theory for convex and nonconvex matrix sensing,

J. Agterberg and R. Vidal, “A high-dimensional statistical theory for convex and nonconvex matrix sensing,” 2025

work page 2025
[19]

Burer-Monteiro factorizability of nuclear norm regularized optimization,

W. Ouyang, T. K. Pong, and M.-C. Yue, “Burer-Monteiro factorizability of nuclear norm regularized optimization,” 2025

work page 2025
[20]

Nocedal and S

J. Nocedal and S. J. Wright,Numerical Optimization. Springer, 2 ed., 2006

work page 2006
[21]

Sparse representation of a polytope and recovery of sparse signals and low-rank matrices,

T. T. Cai and A. Zhang, “Sparse representation of a polytope and recovery of sparse signals and low-rank matrices,”IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 122–132, 2014. 32

work page 2014
[22]

Simultaneous analysis of Lasso and Dantzig selector,

P. J. Bickel, Y. Ritov, and A. B. Tsybakov, “Simultaneous analysis of Lasso and Dantzig selector,” Ann. Stat., vol. 37, no. 4, pp. 1705–1732, 2009

work page 2009
[23]

Low-rank matrix recovery via regularized nuclear norm mini- mization,

W. Wang, F. Zhang, and J. Wang, “Low-rank matrix recovery via regularized nuclear norm mini- mization,”Appl. Comput. Harmon. Anal., vol. 54, pp. 1–19, 2021

work page 2021
[24]

An equivalence between critical points for rank constraints versus low-rank factorizations,

W. Ha, H. Liu, and R. F. Barber, “An equivalence between critical points for rank constraints versus low-rank factorizations,”SIAM J. Optim., vol. 30, no. 4, pp. 2927–2955, 2020

work page 2020
[25]

Srebro,Learning with Matrix Factorizations

N. Srebro,Learning with Matrix Factorizations. PhD thesis, Massachusetts Institute of Technology, 2004

work page 2004
[26]

Maximum-margin matrix factorization,

N. Srebro, J. Rennie, and T. Jaakkola, “Maximum-margin matrix factorization,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 17, (Vancouver, Canada), pp. 1329–1336, Dec. 2004

work page 2004
[27]

Matrix completion and low-rank SVD via fast alternating least squares,

T. Hastie, R. Mazumder, J. D. Lee, and R. Zadeh, “Matrix completion and low-rank SVD via fast alternating least squares,”J. Mach. Learn. Res., vol. 16, no. 104, pp. 3367–3402, 2015

work page 2015
[28]

The non-convex Burer-Monteiro approach works on smooth semidefinite programs,

N. Boumal, V. Voroninski, and A. Bandeira, “The non-convex Burer-Monteiro approach works on smooth semidefinite programs,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 29, (Barcelona), pp. 2757–2765, Dec. 2016

work page 2016
[29]

Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form,

S. Bhojanapalli, N. Boumal, P. Jain, and P. Netrapalli, “Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form,” inProc. Conf. Learn. Theory (COLT), (Stock- holm, Sweden), pp. 3243–3270, July 2018

work page 2018
[30]

Preconditioned subgradient method for composite optimiza- tion: overparameterization and fast convergence,

M. D´ ıaz, L. Jiang, and A. G. Labassi, “Preconditioned subgradient method for composite optimiza- tion: overparameterization and fast convergence,” 2025

work page 2025
[31]

Nonconvex optimization meets low-rank matrix factorization: An overview,

Y. Chi, Y. M. Lu, and Y. Chen, “Nonconvex optimization meets low-rank matrix factorization: An overview,”IEEE Trans. Signal Process., vol. 67, no. 20, pp. 5239–5269, 2019

work page 2019
[32]

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity,

D. St¨ oger and Y. Zhu, “Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity,” Aug. 2024

work page 2024
[33]

Small random initialization is akin to spectral learning: Opti- mization and generalization guarantees for overparameterized low-rank matrix reconstruction,

D. St¨ oger and M. Soltanolkotabi, “Small random initialization is akin to spectral learning: Opti- mization and generalization guarantees for overparameterized low-rank matrix reconstruction,” in Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), (Virtual conference), pp. 23831–23843, Dec. 2021

work page 2021
[34]

The power of preconditioning in overparameterized low- rank matrix sensing,

X. Xu, Y. Shen, Y. Chi, and C. Ma, “The power of preconditioning in overparameterized low- rank matrix sensing,” inProc. Int. Conf. Mach. Learn. (ICML), (Honolulu, HI, United States), pp. 38611–38654, 2023

work page 2023
[35]

Global convergence of sub-gradient method for robust matrix recovery: Small initialization, noisy measurements, and over-parameterization,

J. Ma and S. Fattahi, “Global convergence of sub-gradient method for robust matrix recovery: Small initialization, noisy measurements, and over-parameterization,”J. Mach. Learn. Res., vol. 24, no. 96, pp. 1–84, 2023

work page 2023
[36]

Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing,

M. Soltanolkotabi, D. St¨ oger, and C. Xie, “Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing,”IEEE Trans. Inf. Theory, vol. 71, no. 4, pp. 2991–3037, 2025

work page 2025
[37]

Global optimality of local search for low rank matrix recovery,

S. Bhojanapalli, B. Neyshabur, and N. Srebro, “Global optimality of local search for low rank matrix recovery,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), (Barcelona), pp. 3873–3881, Dec. 2016

work page 2016
[38]

Improved global guarantees for the nonconvex Burer-Monteiro factorization via rank overparameterization,

R. Y. Zhang, “Improved global guarantees for the nonconvex Burer-Monteiro factorization via rank overparameterization,”Math. Program., vol. 213, pp. 1009–1038, 2025

work page 2025
[39]

Geometric analysis of noisy low-rank matrix recovery in the exact parametrized and the overparametrized regimes,

Z. Ma, Y. Bi, J. Lavaei, and S. Sojoudi, “Geometric analysis of noisy low-rank matrix recovery in the exact parametrized and the overparametrized regimes,”INFORMS J. Opt., vol. 5, no. 4, pp. 356–375, 2023

work page 2023
[40]

Low-rank solutions of linear matrix equations via Procrustes flow,

S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht, “Low-rank solutions of linear matrix equations via Procrustes flow,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 48, (New York), pp. 964–973, June 2016. 33

work page 2016
[41]

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach,

D. Park, A. Kyrillidis, C. Carmanis, and S. Sanghavi, “Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach,” inProc. Int. Conf. Artif. Intell. Statist. (AISTATS), (Fort Lauderdale, FL, United States), pp. 65–74, Apr. 2017

work page 2017
[42]

Global optimality in low-rank matrix optimization,

Z. Zhu, Q. Li, G. Tang, and M. B. Wakin, “Global optimality in low-rank matrix optimization,” IEEE Trans. Signal Process., vol. 66, no. 13, pp. 3614–3628, 2018

work page 2018
[43]

The global optimization geometry of low-rank matrix optimization,

Z. Zhu, Q. Li, G. Tang, and M. B. Wakin, “The global optimization geometry of low-rank matrix optimization,”IEEE Trans. Inf. Theory, vol. 67, no. 2, pp. 1308–1331, 2021

work page 2021
[44]

LoRA training provably converges to a low-rank global minimum or it fails loudly (but it probably won’t fail),

J. Kim, J. Kim, and E. K. Ryu, “LoRA training provably converges to a low-rank global minimum or it fails loudly (but it probably won’t fail),” inProc. Int. Conf. Mach. Learn. (ICML), (Vancouver, BC, Canada), pp. 30224–30247, July 2025

work page 2025
[45]

The non-convex geometry of low-rank matrix optimization,

Q. Li, Z. Zhu, and G. Tang, “The non-convex geometry of low-rank matrix optimization,”Inform. Inference., vol. 8, no. 1, pp. 51–96, 2019

work page 2019
[46]

On the absence of spurious local minima in nonlinear low-rank matrix recovery problems,

Y. Bi and J. Lavaei, “On the absence of spurious local minima in nonlinear low-rank matrix recovery problems,” inProc. Int. Conf. Artif. Intell. Statist. (AISTATS), (Virtual conference), pp. 379–387, Apr. 2021. 34

work page 2021

[1] [1]

An overview of low-rank matrix recovery from incomplete observations,

M. A. Davenport and J. Romberg, “An overview of low-rank matrix recovery from incomplete observations,”IEEE J. Sel. Topics Signal Process., vol. 10, no. 4, pp. 608–622, 2016

work page 2016

[2] [2]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” inProc. Int. Conf. Learn. Representations (ICLR), (Virtual conference), Apr. 2022

work page 2022

[3] [3]

Fazel,Matrix rank minimization with applications

M. Fazel,Matrix rank minimization with applications. PhD thesis, Stanford University, 2002

work page 2002

[4] [4]

Guaranteed minimum-rank solutions of linear matrix equa- tions via nuclear norm minimization,

B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equa- tions via nuclear norm minimization,”SIAM Rev., vol. 52, no. 3, pp. 471–501, 2010

work page 2010

[5] [5]

Fixed point and Bregman iterative methods for matrix rank minimization,

S. Ma, D. Goldfarb, and L. Chen, “Fixed point and Bregman iterative methods for matrix rank minimization,”Math. Program., vol. 128, pp. 321–353, 2011

work page 2011

[6] [6]

Estimation of high-dimensional low-rank matrices,

A. Rohde and A. B. Tsybakov, “Estimation of high-dimensional low-rank matrices,”Ann. Stat., vol. 39, no. 2, pp. 887–930, 2011

work page 2011

[7] [7]

Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements,

E. J. Cand` es and Y. Plan, “Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements,”IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 2342–2359, 2011

work page 2011

[8] [8]

Estimation of (near) low-rank matrices with noise and high- dimensional scaling,

S. Negahban and M. J. Wainwright, “Estimation of (near) low-rank matrices with noise and high- dimensional scaling,”Ann. Stat., vol. 39, no. 2, pp. 1069–1097, 2011

work page 2011

[9] [9]

Low rank matrix completion with exponential family noise,

J. Lafond, “Low rank matrix completion with exponential family noise,” inProc. Conf. Learn. Theory (COLT), (Paris), pp. 1224–1243, July 2015

work page 2015

[10] [10]

Complexity bounds for second-order optimality in unconstrained optimization,

C. Cartis, N. I. M. Gould, and P. L. Toint, “Complexity bounds for second-order optimality in unconstrained optimization,”J. Complexity, vol. 28, no. 1, pp. 93–108, 2012

work page 2012

[11] [11]

First-order methods almost always avoid strict saddle points,

J. D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. I. Jordan, and B. Recht, “First-order methods almost always avoid strict saddle points,”Math. Program., vol. 176, pp. 311–337, 2019

work page 2019

[12] [12]

Implicit regular- ization in matrix factorization,

S. Gunasekar, B. E. Woodworth, S. Bhojanapalli, B. Neyshabur, and N. Srebro, “Implicit regular- ization in matrix factorization,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 30, (Long Beach, CA, United States), Dec. 2017

work page 2017

[13] [13]

Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations,

Y. Li, T. Ma, and H. Zhang, “Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations,” inProc. Conf. Learn. Theory (COLT), vol. 75, (Stockholm, Sweden), pp. 2–47, July 2018

work page 2018

[14] [14]

Sharp global guarantees for nonconvex low-rank recovery in the noisy overparame- terized regime,

R. Y. Zhang, “Sharp global guarantees for nonconvex low-rank recovery in the noisy overparame- terized regime,”SIAM J. Optim., vol. 35, no. 3, pp. 2128–2154, 2025

work page 2025

[15] [15]

Improved global landscape guarantees for low-rank factorization in synchronization,

S. Ling, “Improved global landscape guarantees for low-rank factorization in synchronization,” 2026

work page 2026

[16] [16]

Phase retrieval and matrix sensing via benign and overparametrized nonconvex optimization,

A. D. McRae, “Phase retrieval and matrix sensing via benign and overparametrized nonconvex optimization,”IEEE Trans. Inf. Theory, 2026

work page 2026

[17] [17]

Low solution rank of the matrix LASSO under RIP with consequences for rank- constrained algorithms,

A. D. McRae, “Low solution rank of the matrix LASSO under RIP with consequences for rank- constrained algorithms,”Math. Program., vol. 215, pp. 717–741, 2026

work page 2026

[18] [18]

A high-dimensional statistical theory for convex and nonconvex matrix sensing,

J. Agterberg and R. Vidal, “A high-dimensional statistical theory for convex and nonconvex matrix sensing,” 2025

work page 2025

[19] [19]

Burer-Monteiro factorizability of nuclear norm regularized optimization,

W. Ouyang, T. K. Pong, and M.-C. Yue, “Burer-Monteiro factorizability of nuclear norm regularized optimization,” 2025

work page 2025

[20] [20]

Nocedal and S

J. Nocedal and S. J. Wright,Numerical Optimization. Springer, 2 ed., 2006

work page 2006

[21] [21]

Sparse representation of a polytope and recovery of sparse signals and low-rank matrices,

T. T. Cai and A. Zhang, “Sparse representation of a polytope and recovery of sparse signals and low-rank matrices,”IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 122–132, 2014. 32

work page 2014

[22] [22]

Simultaneous analysis of Lasso and Dantzig selector,

P. J. Bickel, Y. Ritov, and A. B. Tsybakov, “Simultaneous analysis of Lasso and Dantzig selector,” Ann. Stat., vol. 37, no. 4, pp. 1705–1732, 2009

work page 2009

[23] [23]

Low-rank matrix recovery via regularized nuclear norm mini- mization,

W. Wang, F. Zhang, and J. Wang, “Low-rank matrix recovery via regularized nuclear norm mini- mization,”Appl. Comput. Harmon. Anal., vol. 54, pp. 1–19, 2021

work page 2021

[24] [24]

An equivalence between critical points for rank constraints versus low-rank factorizations,

W. Ha, H. Liu, and R. F. Barber, “An equivalence between critical points for rank constraints versus low-rank factorizations,”SIAM J. Optim., vol. 30, no. 4, pp. 2927–2955, 2020

work page 2020

[25] [25]

Srebro,Learning with Matrix Factorizations

N. Srebro,Learning with Matrix Factorizations. PhD thesis, Massachusetts Institute of Technology, 2004

work page 2004

[26] [26]

Maximum-margin matrix factorization,

N. Srebro, J. Rennie, and T. Jaakkola, “Maximum-margin matrix factorization,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 17, (Vancouver, Canada), pp. 1329–1336, Dec. 2004

work page 2004

[27] [27]

Matrix completion and low-rank SVD via fast alternating least squares,

T. Hastie, R. Mazumder, J. D. Lee, and R. Zadeh, “Matrix completion and low-rank SVD via fast alternating least squares,”J. Mach. Learn. Res., vol. 16, no. 104, pp. 3367–3402, 2015

work page 2015

[28] [28]

The non-convex Burer-Monteiro approach works on smooth semidefinite programs,

N. Boumal, V. Voroninski, and A. Bandeira, “The non-convex Burer-Monteiro approach works on smooth semidefinite programs,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 29, (Barcelona), pp. 2757–2765, Dec. 2016

work page 2016

[29] [29]

Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form,

S. Bhojanapalli, N. Boumal, P. Jain, and P. Netrapalli, “Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form,” inProc. Conf. Learn. Theory (COLT), (Stock- holm, Sweden), pp. 3243–3270, July 2018

work page 2018

[30] [30]

Preconditioned subgradient method for composite optimiza- tion: overparameterization and fast convergence,

M. D´ ıaz, L. Jiang, and A. G. Labassi, “Preconditioned subgradient method for composite optimiza- tion: overparameterization and fast convergence,” 2025

work page 2025

[31] [31]

Nonconvex optimization meets low-rank matrix factorization: An overview,

Y. Chi, Y. M. Lu, and Y. Chen, “Nonconvex optimization meets low-rank matrix factorization: An overview,”IEEE Trans. Signal Process., vol. 67, no. 20, pp. 5239–5269, 2019

work page 2019

[32] [32]

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity,

D. St¨ oger and Y. Zhu, “Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity,” Aug. 2024

work page 2024

[33] [33]

Small random initialization is akin to spectral learning: Opti- mization and generalization guarantees for overparameterized low-rank matrix reconstruction,

D. St¨ oger and M. Soltanolkotabi, “Small random initialization is akin to spectral learning: Opti- mization and generalization guarantees for overparameterized low-rank matrix reconstruction,” in Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), (Virtual conference), pp. 23831–23843, Dec. 2021

work page 2021

[34] [34]

The power of preconditioning in overparameterized low- rank matrix sensing,

X. Xu, Y. Shen, Y. Chi, and C. Ma, “The power of preconditioning in overparameterized low- rank matrix sensing,” inProc. Int. Conf. Mach. Learn. (ICML), (Honolulu, HI, United States), pp. 38611–38654, 2023

work page 2023

[35] [35]

Global convergence of sub-gradient method for robust matrix recovery: Small initialization, noisy measurements, and over-parameterization,

J. Ma and S. Fattahi, “Global convergence of sub-gradient method for robust matrix recovery: Small initialization, noisy measurements, and over-parameterization,”J. Mach. Learn. Res., vol. 24, no. 96, pp. 1–84, 2023

work page 2023

[36] [36]

Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing,

M. Soltanolkotabi, D. St¨ oger, and C. Xie, “Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing,”IEEE Trans. Inf. Theory, vol. 71, no. 4, pp. 2991–3037, 2025

work page 2025

[37] [37]

Global optimality of local search for low rank matrix recovery,

S. Bhojanapalli, B. Neyshabur, and N. Srebro, “Global optimality of local search for low rank matrix recovery,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), (Barcelona), pp. 3873–3881, Dec. 2016

work page 2016

[38] [38]

Improved global guarantees for the nonconvex Burer-Monteiro factorization via rank overparameterization,

R. Y. Zhang, “Improved global guarantees for the nonconvex Burer-Monteiro factorization via rank overparameterization,”Math. Program., vol. 213, pp. 1009–1038, 2025

work page 2025

[39] [39]

Geometric analysis of noisy low-rank matrix recovery in the exact parametrized and the overparametrized regimes,

Z. Ma, Y. Bi, J. Lavaei, and S. Sojoudi, “Geometric analysis of noisy low-rank matrix recovery in the exact parametrized and the overparametrized regimes,”INFORMS J. Opt., vol. 5, no. 4, pp. 356–375, 2023

work page 2023

[40] [40]

Low-rank solutions of linear matrix equations via Procrustes flow,

S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht, “Low-rank solutions of linear matrix equations via Procrustes flow,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 48, (New York), pp. 964–973, June 2016. 33

work page 2016

[41] [41]

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach,

D. Park, A. Kyrillidis, C. Carmanis, and S. Sanghavi, “Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach,” inProc. Int. Conf. Artif. Intell. Statist. (AISTATS), (Fort Lauderdale, FL, United States), pp. 65–74, Apr. 2017

work page 2017

[42] [42]

Global optimality in low-rank matrix optimization,

Z. Zhu, Q. Li, G. Tang, and M. B. Wakin, “Global optimality in low-rank matrix optimization,” IEEE Trans. Signal Process., vol. 66, no. 13, pp. 3614–3628, 2018

work page 2018

[43] [43]

The global optimization geometry of low-rank matrix optimization,

Z. Zhu, Q. Li, G. Tang, and M. B. Wakin, “The global optimization geometry of low-rank matrix optimization,”IEEE Trans. Inf. Theory, vol. 67, no. 2, pp. 1308–1331, 2021

work page 2021

[44] [44]

LoRA training provably converges to a low-rank global minimum or it fails loudly (but it probably won’t fail),

J. Kim, J. Kim, and E. K. Ryu, “LoRA training provably converges to a low-rank global minimum or it fails loudly (but it probably won’t fail),” inProc. Int. Conf. Mach. Learn. (ICML), (Vancouver, BC, Canada), pp. 30224–30247, July 2025

work page 2025

[45] [45]

The non-convex geometry of low-rank matrix optimization,

Q. Li, Z. Zhu, and G. Tang, “The non-convex geometry of low-rank matrix optimization,”Inform. Inference., vol. 8, no. 1, pp. 51–96, 2019

work page 2019

[46] [46]

On the absence of spurious local minima in nonlinear low-rank matrix recovery problems,

Y. Bi and J. Lavaei, “On the absence of spurious local minima in nonlinear low-rank matrix recovery problems,” inProc. Int. Conf. Artif. Intell. Statist. (AISTATS), (Virtual conference), pp. 379–387, Apr. 2021. 34

work page 2021