Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees

Martin J. Wainwright; Yudong Chen

arxiv: 1509.03025 · v1 · pith:GA3IGLCPnew · submitted 2015-09-10 · 🧮 math.ST · cs.LG· stat.ML· stat.TH

Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees

Yudong Chen , Martin J. Wainwright This is my paper

classification 🧮 math.ST cs.LGstat.MLstat.TH

keywords matrixproblemswhendescentgradientproblemprojectedcompletion

0 comments

read the original abstract

Optimization problems with rank constraints arise in many applications, including matrix regression, structured PCA, matrix completion and matrix decomposition problems. An attractive heuristic for solving such problems is to factorize the low-rank matrix, and to run projected gradient descent on the nonconvex factorized optimization problem. The goal of this problem is to provide a general theoretical framework for understanding when such methods work well, and to characterize the nature of the resulting fixed point. We provide a simple set of conditions under which projected gradient descent, when given a suitable initialization, converges geometrically to a statistically useful solution. Our results are applicable even when the initial solution is outside any region of local convexity, and even when the problem is globally concave. Working in a non-asymptotic framework, we show that our conditions are satisfied for a wide range of concrete models, including matrix regression, structured PCA, matrix completion with real and quantized observations, matrix decomposition, and graph clustering problems. Simulation results show excellent agreement with the theoretical predictions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sample efficient inductive matrix completion with noise and inexact side information
stat.ML 2026-05 unverdicted novelty 7.0

Nonconvex projected gradient descent for noisy inductive matrix completion achieves linear convergence and order-optimal error at sample complexity scaling with side-information dimension a instead of ambient dimension n.
Robust Uniform Recovery of Structured Signals from Nonlinear Observations
cs.IT 2026-04 unverdicted novelty 7.0

RAIC unifies uniform recovery of structured signals from nonlinear observations via PGD, yielding error rates comparable to nonuniform guarantees up to log factors in sparse and 1-bit settings.
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
cs.LG 2024-03 conditional novelty 7.0

GaLore performs full-parameter LLM training with up to 65.5% less optimizer memory by projecting gradients onto a low-rank subspace at each step, matching full-rank performance on LLaMA pre-training and RoBERTa fine-tuning.
GWT: Scalable Optimizer State Compression for Large Language Model Training
cs.LG 2025-01 unverdicted novelty 6.0

GWT projects gradients into wavelet subspaces to compress optimizer states for memory-efficient LLM training while claiming performance parity with full-rank updates.
Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation
stat.ML 2026-05 unverdicted novelty 5.0

Nonconvex low-rank matrix estimation procedures are shown to be equivalent to locally strongly convex formulations via a benign regularizer that does not change the algorithm's update rule.
Low-Rank Adaptation Redux for Large Models
cs.LG 2026-04 unverdicted novelty 3.0

An overview revisits LoRA variants by categorizing advances in architectural design, efficient optimization, and applications while linking them to classical signal processing tools for principled fine-tuning.