Sparse PCA: A New Scalable Estimator Based On Integer Programming

Kayhan Behdin; Rahul Mazumder

arxiv: 2109.11142 · v4 · submitted 2021-09-23 · 📊 stat.ME · math.ST· stat.TH

Sparse PCA: A New Scalable Estimator Based On Integer Programming

Kayhan Behdin , Rahul Mazumder This is my paper

Pith reviewed 2026-05-24 13:51 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH

keywords sparse PCAmixed integer programmingspiked covariance modelsupport recoveryestimation error boundsscalable optimizationGaussian noise model

0 comments

The pith

A new mixed integer program for sparse PCA uses the spiked covariance model to scale to 20,000 features with statistical guarantees on error and support recovery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates a new estimator for sparse principal component analysis as a mixed integer program. It derives the formulation from the spiked covariance model and properties of the multivariate Gaussian distribution, then proves guarantees for estimation error and support recovery. The guarantees continue to hold under departures from the exact model and when the optimization problem is solved only approximately. A custom solver for the program runs on instances with up to 20,000 features in minutes and produces results with competitive statistical properties on both synthetic and real data.

Core claim

The central claim is that a mixed integer program derived from the spiked covariance model yields a sparse PCA estimator with provable bounds on estimation error and support recovery; the program can be solved at scale with a custom algorithm that outperforms earlier exact MIP approaches while the statistical guarantees remain valid for approximate solutions and modest model misspecification.

What carries the argument

A mixed integer program that encodes the selection of sparse principal components by exploiting the structure of the spiked covariance model and the multivariate Gaussian likelihood.

If this is right

The estimator recovers the true support with high probability under the spiked model.
Statistical guarantees continue to apply when the MIP is solved only to moderate accuracy.
The same guarantees hold when the data deviate mildly from the exact spiked model.
Problems with twenty thousand features become computationally feasible on standard hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same model-based MIP construction may be adaptable to other sparse estimation tasks that admit a low-rank plus noise decomposition.
If warm-start or branching heuristics can be further improved, the method could reach dimensions beyond twenty thousand features.
Because real data rarely match the spiked model exactly, empirical checks on non-Gaussian or multi-spike covariances would clarify practical robustness.

Load-bearing premise

The data are generated from (or are close to) a spiked covariance model with multivariate Gaussian noise.

What would settle it

A simulation in which data are drawn from a covariance matrix lacking a single dominant spike and the estimator recovers an incorrect support set with high probability.

Figures

Figures reproduced from arXiv: 2109.11142 by Kayhan Behdin, Rahul Mazumder.

**Figure 1.** Figure 1: Estimation and support recovery error as available from different methods on synthetic data with p = 10000 and different values of n (along the x-axis), as discussed in the text (Section 5.1.2). The left panels show results with s = 5 and right panels s = 10. The top panels compare estimation performance and bottom ones compare support recovery. Our proposed approach results in high-quality estimation perf… view at source ↗

**Figure 2.** Figure 2: Numerical results for the synthetic dataset with p = 10000, n = 7500 in Section 5.1.3. The left panel shows the estimation performance and the right panel shows the support recovery performance. s = 5 s = 10 |sin ∠(ˆu, u ∗ )| 0.025 0.050 0.075 0.100 5000 10000 15000 20000 n method CovThresh SPCA−SLS SPCA−SLSR SPCAvSLR 0.0 0.2 0.4 0.6 5000 10000 15000 20000 n method CovThresh SPCA−SLS SPCA−SLSR SPCAvSLR [P… view at source ↗

**Figure 3.** Figure 3: Numerical results for the synthetic dataset with p = 20000 in Section 5.1.4. The left panel shows the statistical performance for s = 5. The right panel shows the statistical performance for s = 10. λ chosen as in Section 5.1.1), TrunPow, SPCAvSLR and CovTresh for this data matrix and two values of s ∈ {4, 5}. Parameter selection is done similar to the synthetic data experiments in Section 5.1. We use the … view at source ↗

**Figure 4.** Figure 4: Comparison of estimated PCs by different algorithms for the real dataset in Section 5.2. The left panel shows the results for s = 4 and the right panel shows the results for s = 5. In both cases, SPCA-SLS reaches the optimality gap of less than 12% and SPCA-SLSR reaches the optimality gap of less than 10% after 10 minutes. to a denser support. 6 Conclusion In this paper, we consider a discrete optimization… view at source ↗

read the original abstract

We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global optimality, leading to estimators that are known to enjoy optimal statistical properties. However, prior MIP algorithms for SPCA appear to be limited in terms of scalability to up to a thousand features or so. In this paper, we propose a new estimator for SPCA which can be formulated as a MIP. Different from earlier work, we make use of the underlying spiked covariance model and properties of the multivariate Gaussian distribution to arrive at our estimator. We establish statistical guarantees for our proposed estimator in terms of estimation error and support recovery. We derive guarantees under departures from the spiked covariance model, and for approximate solutions to the optimization problem. We propose a custom algorithm to solve the MIP, which scales better than off-the-shelf solvers, and demonstrate that our approach can be much more computationally attractive compared to earlier exact MIP-based approaches for the SPCA problem. Our numerical experiments on synthetic and real datasets show that our algorithms can address problems with up to 20,000 features in minutes; and generally result in favorable statistical properties compared to existing popular approaches for SPCA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a MIP formulation for SPCA that uses the spiked Gaussian model to reach 20k features, with a custom solver that beats prior exact methods on speed.

read the letter

The main point is a new MIP estimator for sparse PCA that builds the objective directly from the spiked covariance model and multivariate Gaussian properties. This produces a custom algorithm that scales to problems with 20,000 features in minutes, well beyond the roughly 1,000-feature limit of earlier MIP approaches for the same problem. The experiments on synthetic and real data show competitive or better statistical performance alongside the runtime gains. They also state guarantees for estimation error and support recovery, including under model departures and for approximate solutions. That combination of scaling and reported guarantees is the concrete advance. The formulation is explicitly model-dependent, which is how they get the speed, but it means the approach is tied to the spiked Gaussian setup used for both derivation and proofs. The departure analysis is claimed to cover some robustness, yet if the noise has heavier tails or moments that break the Gaussian-derived equivalence, the support recovery results may not carry over as stated. The paper does not appear to reduce to prior estimators or rely on circular self-citations. This is aimed at people working on exact or near-exact high-dimensional estimation methods, especially those comparing MIP formulations or needing practical solvers for SPCA. A reader focused on computational statistics would get value from the scaling numbers and the custom algorithm details. It deserves peer review because the scalability claim is testable and the formulation is distinct enough to warrant checking the proofs and experiments in full.

Referee Report

2 major / 2 minor

Summary. The paper proposes a new mixed-integer programming (MIP) formulation for sparse principal component analysis (SPCA) under the spiked covariance model. It derives the MIP objective using properties of the multivariate Gaussian distribution, establishes statistical guarantees for estimation error and support recovery both under the exact model and under departures from it (as well as for approximate solutions), introduces a custom solver claimed to scale to 20,000 features, and reports favorable performance versus prior MIP and other SPCA methods on synthetic and real data.

Significance. If the statistical guarantees are correctly established and the custom solver delivers the claimed scalability without sacrificing the exactness properties, the work would supply a practically usable, theoretically supported exact method for high-dimensional SPCA that improves on the scalability limits of earlier MIP formulations (previously limited to roughly 1,000 features).

major comments (2)

[Estimator derivation and guarantees under departures] The section deriving the estimator and the subsequent section on statistical guarantees under departures: the MIP objective is obtained by explicit use of the multivariate Gaussian likelihood under the spiked covariance model; the departure analysis is described as controlling bias terms, but it is not shown whether this preserves the exact equivalence between the original optimization problem and the MIP formulation. If the MIP form is tied to Gaussianity, the support-recovery guarantee may fail for distributions with infinite fourth moments even when the covariance remains spiked.
[Guarantees for approximate solutions] The section on approximate solutions: the claim that statistical guarantees extend to approximate MIP solutions is load-bearing for the practical utility of the custom solver, yet the error bounds relating the approximation gap to the estimation and support-recovery errors are not stated explicitly enough to verify whether they remain useful when the solver is terminated early on large instances.

minor comments (2)

[Abstract and experimental results] The abstract states that the approach 'can address problems with up to 20,000 features in minutes' but does not specify the hardware, MIP solver tolerances, or exact timing metrics used in the experiments; this detail belongs in the experimental section for reproducibility.
[Notation and model setup] Notation for the spiked model parameters (e.g., spike strength, noise variance) should be introduced once and used consistently when stating the estimation-error bounds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Estimator derivation and guarantees under departures] The section deriving the estimator and the subsequent section on statistical guarantees under departures: the MIP objective is obtained by explicit use of the multivariate Gaussian likelihood under the spiked covariance model; the departure analysis is described as controlling bias terms, but it is not shown whether this preserves the exact equivalence between the original optimization problem and the MIP formulation. If the MIP form is tied to Gaussianity, the support-recovery guarantee may fail for distributions with infinite fourth moments even when the covariance remains spiked.

Authors: The MIP formulation is derived from the exact Gaussian likelihood under the spiked model, so the equivalence to the original problem holds under that model. For departures, we bound the difference between the Gaussian objective and the true (non-Gaussian) objective by a bias term whose size is controlled under finite fourth-moment assumptions; the MIP is then solved exactly on the Gaussian objective, and the resulting estimator is shown to be close to the true sparse vector via this bias control. The support-recovery guarantee under departures therefore requires the same moment conditions used for the bias bound. We will revise the manuscript to state these moment conditions explicitly and to clarify that the equivalence is with respect to the Gaussian objective (not the departed distribution). revision: partial
Referee: [Guarantees for approximate solutions] The section on approximate solutions: the claim that statistical guarantees extend to approximate MIP solutions is load-bearing for the practical utility of the custom solver, yet the error bounds relating the approximation gap to the estimation and support-recovery errors are not stated explicitly enough to verify whether they remain useful when the solver is terminated early on large instances.

Authors: We agree that the dependence of the statistical error on the MIP approximation gap should be stated more explicitly. In the current draft we show that an additive optimality gap of size ε increases the estimation error by a term linear in ε (scaled by problem constants) and that support recovery continues to hold provided ε is smaller than a gap proportional to the minimal signal strength. We will expand this section with the precise constants and a short discussion of how large an ε remains tolerable for early termination on instances with 10,000–20,000 features. revision: yes

Circularity Check

0 steps flagged

No circularity: MIP estimator and guarantees derived from external spiked covariance model

full rationale

The paper formulates its MIP estimator by invoking the standard spiked covariance model plus multivariate Gaussian properties (abstract), which are external to the paper and not self-defined or fitted from the target data. Guarantees for estimation error and support recovery are stated under the model and explicit departures, without reducing to a fitted parameter or self-citation chain. No equations or steps in the provided text exhibit self-definitional equivalence, fitted-input-as-prediction, or load-bearing self-citation that collapses the derivation to its inputs by construction. The central claim remains independent of the paper's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the central claim rests on the spiked covariance model and Gaussian assumptions used to derive both the MIP and the guarantees. No free parameters or invented entities are mentioned.

axioms (1)

domain assumption Data follows (or is close to) the spiked covariance model with multivariate Gaussian distribution
Invoked to arrive at the estimator and to establish statistical guarantees (abstract).

pith-pipeline@v0.9.0 · 5762 in / 1163 out tokens · 31600 ms · 2026-05-24T13:51:38.601553+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We make use of the underlying spiked covariance model and properties of the multivariate Gaussian distribution to arrive at our estimator... min β,z ∑_j ||X:,j − ∑_{i≠j} β_{i,j} X:,i ||₂² s.t. z∈{0,1}^p, ∑ zi ≤ s, β_{i,j}(1−z_i)=β_{i,j}(1−z_j)=0
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

[1]

u rk, Alper Atamt \

M Selim Akt \"u rk, Alper Atamt \"u rk, and Sinan G \"u rel. A strong conic quadratic reformulation for machine-job assignment with controllable processing times. Operations Research Letters, 37 0 (3): 0 187--191, 2009

work page 2009
[2]

Amini and Martin J

Arash A. Amini and Martin J. Wainwright. High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist., 37 0 (5B): 0 2877--2921, 10 2009. doi:10.1214/08-AOS664. URL https://doi.org/10.1214/08-AOS664

work page doi:10.1214/08-aos664 2009
[3]

Archetypal analysis for sparse nonnegative matrix factorization: Robustness under misspecification

Kayhan Behdin and Rahul Mazumder. Archetypal analysis for sparse nonnegative matrix factorization: Robustness under misspecification. arXiv preprint arXiv:2104.03527, 2021

work page arXiv 2021
[4]

Certifiably optimal sparse principal component analysis

Lauren Berk and Dimitris Bertsimas. Certifiably optimal sparse principal component analysis. Mathematical Programming Computation, 11 0 (3): 0 381--420, 2019

work page 2019
[5]

Optimal detection of sparse principal components in high dimension

Quentin Berthet and Philippe Rigollet. Optimal detection of sparse principal components in high dimension. Ann. Statist., 41 0 (4): 0 1780--1815, 08 2013. doi:10.1214/13-AOS1127. URL https://doi.org/10.1214/13-AOS1127

work page doi:10.1214/13-aos1127 2013
[6]

Nonlinear programming

Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48 0 (3): 0 334--334, 1997

work page 1997
[7]

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

Dimitris Bertsimas and Bart Van Parys. Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. The Annals of Statistics, 48 0 (1): 0 300--323, 2020

work page 2020
[8]

Best subset selection via a modern optimization lens

Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. The annals of statistics, 44 0 (2): 0 813--852, 2016

work page 2016
[9]

Solving large-scale sparse pca to certifiable (near) optimality

Dimitris Bertsimas, Ryan Cory-Wright, and Jean Pauphilet. Solving large-scale sparse pca to certifiable (near) optimality. arXiv preprint arXiv:2005.05195, 2020

work page arXiv 2005
[10]

Convex optimization

Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004

work page 2004
[11]

Sparse pca from sparse linear regression

Guy Bresler, Sung Min Park, and Madalina Persu. Sparse pca from sparse linear regression. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 10965–10975, Red Hook, NY, USA, 2018. Curran Associates Inc

work page 2018
[12]

Tony Cai and Harrison H

T. Tony Cai and Harrison H. Zhou. Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist., 40 0 (5): 0 2389--2420, 10 2012. doi:10.1214/12-AOS998. URL https://doi.org/10.1214/12-AOS998

work page doi:10.1214/12-aos998 2012
[13]

A direct formulation for sparse pca using semidefinite programming

Alexandre d'Aspremont, Laurent E Ghaoui, Michael I Jordan, and Gert R Lanckriet. A direct formulation for sparse pca using semidefinite programming. In Advances in neural information processing systems, pages 41--48, 2005

work page 2005
[14]

Sparse pca via covariance thresholding

Yash Deshpande and Andrea Montanari. Sparse pca via covariance thresholding. The Journal of Machine Learning Research, 17 0 (1): 0 4913--4953, 2016

work page 2016
[15]

A convex integer programming approach for optimal sparse pca

Santanu S Dey, Rahul Mazumder, and Guanyi Wang. A convex integer programming approach for optimal sparse pca. arXiv preprint arXiv:1810.09062, 2018

work page arXiv 2018
[16]

An outer-approximation algorithm for a class of mixed-integer nonlinear programs

Marco A Duran and Ignacio E Grossmann. An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical programming, 36 0 (3): 0 307--339, 1986

work page 1986
[17]

Optimal solutions for sparse principal component analysis

Alexandre d’Aspremont, Francis Bach, and Laurent El Ghaoui. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9 0 (Jul): 0 1269--1294, 2008

work page 2008
[18]

When is best subset selection the" best"? arXiv preprint arXiv:2007.01478, 2020

Jianqing Fan, Yongyi Guo, and Ziwei Zhu. When is best subset selection the" best"? arXiv preprint arXiv:2007.01478, 2020

work page arXiv 2007
[19]

Frangioni and C

A. Frangioni and C. Gentile. Perspective cuts for a class of convex 0–1 mixed integer programs. Math. Program., 106 0 (2): 0 225–236, May 2006. ISSN 0025-5610

work page 2006
[20]

Sparse principal component analysis via axis-aligned random projections

Milana Gataric, Tengyao Wang, and Richard J Samworth. Sparse principal component analysis via axis-aligned random projections. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82 0 (2): 0 329--359, 2020

work page 2020
[21]

Perspective reformulations of mixed integer nonlinear programs with indicator variables

Oktay G \"u nl \"u k and Jeff Linderoth. Perspective reformulations of mixed integer nonlinear programs with indicator variables. Mathematical programming, 124 0 (1): 0 183--205, 2010

work page 2010
[22]

Result analysis of the nips 2003 feature selection challenge

Isabelle Guyon, Steve Gunn, Asa Ben-Hur, and Gideon Dror. Result analysis of the nips 2003 feature selection challenge. In Advances in Neural Information Processing Systems, volume 17. MIT Press, 2005

work page 2003
[23]

Statistical learning with sparsity: the lasso and generalizations

Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. Chapman and Hall/CRC, 2019

work page 2019
[24]

Fast best subset selection: Coordinate descent and local combinatorial optimization algorithms

Hussein Hazimeh and Rahul Mazumder. Fast best subset selection: Coordinate descent and local combinatorial optimization algorithms. Operations Research, 68 0 (5): 0 1517--1537, 2020

work page 2020
[25]

Sparse regression at scale: Branch-and-bound rooted in first-order optimization

Hussein Hazimeh, Rahul Mazumder, and Ali Saab. Sparse regression at scale: Branch-and-bound rooted in first-order optimization. arXiv preprint arXiv:2004.06152, 2020

work page arXiv 2004
[26]

Analysis of a complex of statistical variables into principal components

Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24 0 (6): 0 417, 1933

work page 1933
[27]

Johnstone

Iain M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist., 29 0 (2): 0 295--327, 04 2001. doi:10.1214/aos/1009210544. URL https://doi.org/10.1214/aos/1009210544

work page doi:10.1214/aos/1009210544 2001
[28]

On consistency and sparsity for principal components analysis in high dimensions

Iain M Johnstone and Arthur Yu Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104 0 (486): 0 682--693, 2009

work page 2009
[29]

A modified principal component technique based on the lasso

Ian T Jolliffe, Nickolay T Trendafilov, and Mudassir Uddin. A modified principal component technique based on the lasso. Journal of computational and Graphical Statistics, 12 0 (3): 0 531--547, 2003

work page 2003
[30]

Do semidefinite relaxations solve sparse pca up to the information limit? The Annals of Statistics, 43 0 (3): 0 1300--1322, 2015

Robert Krauthgamer, Boaz Nadler, Dan Vilenchik, et al. Do semidefinite relaxations solve sparse pca up to the information limit? The Annals of Statistics, 43 0 (3): 0 1300--1322, 2015

work page 2015
[31]

Jing Lei and Vincent Q. Vu. Sparsistency and agnostic inference in sparse PCA . The Annals of Statistics, 43 0 (1): 0 299 -- 322, 2015

work page 2015
[32]

Exact and approximation algorithms for sparse pca

Yongchun Li and Weijun Xie. Exact and approximation algorithms for sparse pca. arXiv preprint arXiv:2008.12438, 2020

work page arXiv 2008
[33]

Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint

Ronny Luss and Marc Teboulle. Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. siam REVIEW, 55 0 (1): 0 65--98, 2013

work page 2013
[34]

High-dimensional graphs and variable selection with the Lasso

Nicolai Meinshausen and Peter Bühlmann. High-dimensional graphs and variable selection with the Lasso . The Annals of Statistics, 34 0 (3): 0 1436 -- 1462, 2006

work page 2006
[35]

Spectral bounds for sparse pca: Exact and greedy algorithms

Baback Moghaddam, Yair Weiss, and Shai Avidan. Spectral bounds for sparse pca: Exact and greedy algorithms. In Advances in neural information processing systems, pages 915--922, 2006

work page 2006
[36]

Asymptotics of sample eigenstructure for a large dimensional spiked covariance model

Debashis Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17 0 (4): 0 1617--1642, 2007

work page 2007
[37]

Raskutti , M

G. Raskutti , M. J. Wainwright , and B. Yu . Minimax rates of estimation for high-dimensional linear regression over _q -balls. IEEE Transactions on Information Theory, 57 0 (10): 0 6976--6994, 2011

work page 2011
[38]

Alternating maximization: Unifying framework for 8 sparse pca formulations and efficient parallel codes

Peter Richt \'a rik, Majid Jahani, Selin Damla Ahipa s ao g lu, and Martin Tak \'a c . Alternating maximization: Unifying framework for 8 sparse pca formulations and efficient parallel codes. Optimization and Engineering, pages 1--27, 2020

work page 2020
[39]

High dimensional statistics

Phillippe Rigollet and Jan-Christian H \"u tter. High dimensional statistics. Lecture notes for course 18S997, 2015

work page 2015
[40]

Hanson-Wright inequality and sub-gaussian concentration

Mark Rudelson and Roman Vershynin. Hanson-Wright inequality and sub-gaussian concentration . Electronic Communications in Probability, 18 0 (none): 0 1 -- 9, 2013

work page 2013
[41]

Stewart and Ji-guang Sun

G.W. Stewart and Ji-guang Sun. Matrix Perturbation Theory. Computer science and scientific computing. Academic Press, 1990

work page 1990
[42]

High-dimensional probability: An introduction with applications in data science, volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

work page 2018
[43]

Samworth

Tengyao Wang, Quentin Berthet, and Richard J. Samworth. Statistical and computational trade-offs in estimation of sparse principal components . The Annals of Statistics, 44 0 (5): 0 1896 -- 1930, 2016

work page 1930
[44]

Perturbation bounds in connection with singular value decomposition

Per- ke Wedin. Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics, 12 0 (1): 0 99--111, 1972

work page 1972
[45]

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

Daniela M Witten, Robert Tibshirani, and Trevor Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10 0 (3): 0 515--534, 2009

work page 2009
[46]

Truncated power method for sparse eigenvalue problems

Xiao-Tong Yuan and Tong Zhang. Truncated power method for sparse eigenvalue problems. Journal of Machine Learning Research, 14 0 (Apr): 0 899--925, 2013

work page 2013
[47]

The Schur complement and its applications, volume 4

Fuzhen Zhang. The Schur complement and its applications, volume 4. Springer Science & Business Media, 2006

work page 2006
[48]

Sparse principal component analysis

Hui Zou, Trevor Hastie, and Robert Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15 0 (2): 0 265--286, 2006

work page 2006

[1] [1]

u rk, Alper Atamt \

M Selim Akt \"u rk, Alper Atamt \"u rk, and Sinan G \"u rel. A strong conic quadratic reformulation for machine-job assignment with controllable processing times. Operations Research Letters, 37 0 (3): 0 187--191, 2009

work page 2009

[2] [2]

Amini and Martin J

Arash A. Amini and Martin J. Wainwright. High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist., 37 0 (5B): 0 2877--2921, 10 2009. doi:10.1214/08-AOS664. URL https://doi.org/10.1214/08-AOS664

work page doi:10.1214/08-aos664 2009

[3] [3]

Archetypal analysis for sparse nonnegative matrix factorization: Robustness under misspecification

Kayhan Behdin and Rahul Mazumder. Archetypal analysis for sparse nonnegative matrix factorization: Robustness under misspecification. arXiv preprint arXiv:2104.03527, 2021

work page arXiv 2021

[4] [4]

Certifiably optimal sparse principal component analysis

Lauren Berk and Dimitris Bertsimas. Certifiably optimal sparse principal component analysis. Mathematical Programming Computation, 11 0 (3): 0 381--420, 2019

work page 2019

[5] [5]

Optimal detection of sparse principal components in high dimension

Quentin Berthet and Philippe Rigollet. Optimal detection of sparse principal components in high dimension. Ann. Statist., 41 0 (4): 0 1780--1815, 08 2013. doi:10.1214/13-AOS1127. URL https://doi.org/10.1214/13-AOS1127

work page doi:10.1214/13-aos1127 2013

[6] [6]

Nonlinear programming

Dimitri P Bertsekas. Nonlinear programming. Journal of the Operational Research Society, 48 0 (3): 0 334--334, 1997

work page 1997

[7] [7]

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

Dimitris Bertsimas and Bart Van Parys. Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. The Annals of Statistics, 48 0 (1): 0 300--323, 2020

work page 2020

[8] [8]

Best subset selection via a modern optimization lens

Dimitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. The annals of statistics, 44 0 (2): 0 813--852, 2016

work page 2016

[9] [9]

Solving large-scale sparse pca to certifiable (near) optimality

Dimitris Bertsimas, Ryan Cory-Wright, and Jean Pauphilet. Solving large-scale sparse pca to certifiable (near) optimality. arXiv preprint arXiv:2005.05195, 2020

work page arXiv 2005

[10] [10]

Convex optimization

Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004

work page 2004

[11] [11]

Sparse pca from sparse linear regression

Guy Bresler, Sung Min Park, and Madalina Persu. Sparse pca from sparse linear regression. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 10965–10975, Red Hook, NY, USA, 2018. Curran Associates Inc

work page 2018

[12] [12]

Tony Cai and Harrison H

T. Tony Cai and Harrison H. Zhou. Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist., 40 0 (5): 0 2389--2420, 10 2012. doi:10.1214/12-AOS998. URL https://doi.org/10.1214/12-AOS998

work page doi:10.1214/12-aos998 2012

[13] [13]

A direct formulation for sparse pca using semidefinite programming

Alexandre d'Aspremont, Laurent E Ghaoui, Michael I Jordan, and Gert R Lanckriet. A direct formulation for sparse pca using semidefinite programming. In Advances in neural information processing systems, pages 41--48, 2005

work page 2005

[14] [14]

Sparse pca via covariance thresholding

Yash Deshpande and Andrea Montanari. Sparse pca via covariance thresholding. The Journal of Machine Learning Research, 17 0 (1): 0 4913--4953, 2016

work page 2016

[15] [15]

A convex integer programming approach for optimal sparse pca

Santanu S Dey, Rahul Mazumder, and Guanyi Wang. A convex integer programming approach for optimal sparse pca. arXiv preprint arXiv:1810.09062, 2018

work page arXiv 2018

[16] [16]

An outer-approximation algorithm for a class of mixed-integer nonlinear programs

Marco A Duran and Ignacio E Grossmann. An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical programming, 36 0 (3): 0 307--339, 1986

work page 1986

[17] [17]

Optimal solutions for sparse principal component analysis

Alexandre d’Aspremont, Francis Bach, and Laurent El Ghaoui. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9 0 (Jul): 0 1269--1294, 2008

work page 2008

[18] [18]

When is best subset selection the" best"? arXiv preprint arXiv:2007.01478, 2020

Jianqing Fan, Yongyi Guo, and Ziwei Zhu. When is best subset selection the" best"? arXiv preprint arXiv:2007.01478, 2020

work page arXiv 2007

[19] [19]

Frangioni and C

A. Frangioni and C. Gentile. Perspective cuts for a class of convex 0–1 mixed integer programs. Math. Program., 106 0 (2): 0 225–236, May 2006. ISSN 0025-5610

work page 2006

[20] [20]

Sparse principal component analysis via axis-aligned random projections

Milana Gataric, Tengyao Wang, and Richard J Samworth. Sparse principal component analysis via axis-aligned random projections. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82 0 (2): 0 329--359, 2020

work page 2020

[21] [21]

Perspective reformulations of mixed integer nonlinear programs with indicator variables

Oktay G \"u nl \"u k and Jeff Linderoth. Perspective reformulations of mixed integer nonlinear programs with indicator variables. Mathematical programming, 124 0 (1): 0 183--205, 2010

work page 2010

[22] [22]

Result analysis of the nips 2003 feature selection challenge

Isabelle Guyon, Steve Gunn, Asa Ben-Hur, and Gideon Dror. Result analysis of the nips 2003 feature selection challenge. In Advances in Neural Information Processing Systems, volume 17. MIT Press, 2005

work page 2003

[23] [23]

Statistical learning with sparsity: the lasso and generalizations

Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. Chapman and Hall/CRC, 2019

work page 2019

[24] [24]

Fast best subset selection: Coordinate descent and local combinatorial optimization algorithms

Hussein Hazimeh and Rahul Mazumder. Fast best subset selection: Coordinate descent and local combinatorial optimization algorithms. Operations Research, 68 0 (5): 0 1517--1537, 2020

work page 2020

[25] [25]

Sparse regression at scale: Branch-and-bound rooted in first-order optimization

Hussein Hazimeh, Rahul Mazumder, and Ali Saab. Sparse regression at scale: Branch-and-bound rooted in first-order optimization. arXiv preprint arXiv:2004.06152, 2020

work page arXiv 2004

[26] [26]

Analysis of a complex of statistical variables into principal components

Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24 0 (6): 0 417, 1933

work page 1933

[27] [27]

Johnstone

Iain M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist., 29 0 (2): 0 295--327, 04 2001. doi:10.1214/aos/1009210544. URL https://doi.org/10.1214/aos/1009210544

work page doi:10.1214/aos/1009210544 2001

[28] [28]

On consistency and sparsity for principal components analysis in high dimensions

Iain M Johnstone and Arthur Yu Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104 0 (486): 0 682--693, 2009

work page 2009

[29] [29]

A modified principal component technique based on the lasso

Ian T Jolliffe, Nickolay T Trendafilov, and Mudassir Uddin. A modified principal component technique based on the lasso. Journal of computational and Graphical Statistics, 12 0 (3): 0 531--547, 2003

work page 2003

[30] [30]

Do semidefinite relaxations solve sparse pca up to the information limit? The Annals of Statistics, 43 0 (3): 0 1300--1322, 2015

Robert Krauthgamer, Boaz Nadler, Dan Vilenchik, et al. Do semidefinite relaxations solve sparse pca up to the information limit? The Annals of Statistics, 43 0 (3): 0 1300--1322, 2015

work page 2015

[31] [31]

Jing Lei and Vincent Q. Vu. Sparsistency and agnostic inference in sparse PCA . The Annals of Statistics, 43 0 (1): 0 299 -- 322, 2015

work page 2015

[32] [32]

Exact and approximation algorithms for sparse pca

Yongchun Li and Weijun Xie. Exact and approximation algorithms for sparse pca. arXiv preprint arXiv:2008.12438, 2020

work page arXiv 2008

[33] [33]

Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint

Ronny Luss and Marc Teboulle. Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. siam REVIEW, 55 0 (1): 0 65--98, 2013

work page 2013

[34] [34]

High-dimensional graphs and variable selection with the Lasso

Nicolai Meinshausen and Peter Bühlmann. High-dimensional graphs and variable selection with the Lasso . The Annals of Statistics, 34 0 (3): 0 1436 -- 1462, 2006

work page 2006

[35] [35]

Spectral bounds for sparse pca: Exact and greedy algorithms

Baback Moghaddam, Yair Weiss, and Shai Avidan. Spectral bounds for sparse pca: Exact and greedy algorithms. In Advances in neural information processing systems, pages 915--922, 2006

work page 2006

[36] [36]

Asymptotics of sample eigenstructure for a large dimensional spiked covariance model

Debashis Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17 0 (4): 0 1617--1642, 2007

work page 2007

[37] [37]

Raskutti , M

G. Raskutti , M. J. Wainwright , and B. Yu . Minimax rates of estimation for high-dimensional linear regression over _q -balls. IEEE Transactions on Information Theory, 57 0 (10): 0 6976--6994, 2011

work page 2011

[38] [38]

Alternating maximization: Unifying framework for 8 sparse pca formulations and efficient parallel codes

Peter Richt \'a rik, Majid Jahani, Selin Damla Ahipa s ao g lu, and Martin Tak \'a c . Alternating maximization: Unifying framework for 8 sparse pca formulations and efficient parallel codes. Optimization and Engineering, pages 1--27, 2020

work page 2020

[39] [39]

High dimensional statistics

Phillippe Rigollet and Jan-Christian H \"u tter. High dimensional statistics. Lecture notes for course 18S997, 2015

work page 2015

[40] [40]

Hanson-Wright inequality and sub-gaussian concentration

Mark Rudelson and Roman Vershynin. Hanson-Wright inequality and sub-gaussian concentration . Electronic Communications in Probability, 18 0 (none): 0 1 -- 9, 2013

work page 2013

[41] [41]

Stewart and Ji-guang Sun

G.W. Stewart and Ji-guang Sun. Matrix Perturbation Theory. Computer science and scientific computing. Academic Press, 1990

work page 1990

[42] [42]

High-dimensional probability: An introduction with applications in data science, volume 47

Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018

work page 2018

[43] [43]

Samworth

Tengyao Wang, Quentin Berthet, and Richard J. Samworth. Statistical and computational trade-offs in estimation of sparse principal components . The Annals of Statistics, 44 0 (5): 0 1896 -- 1930, 2016

work page 1930

[44] [44]

Perturbation bounds in connection with singular value decomposition

Per- ke Wedin. Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics, 12 0 (1): 0 99--111, 1972

work page 1972

[45] [45]

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

Daniela M Witten, Robert Tibshirani, and Trevor Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10 0 (3): 0 515--534, 2009

work page 2009

[46] [46]

Truncated power method for sparse eigenvalue problems

Xiao-Tong Yuan and Tong Zhang. Truncated power method for sparse eigenvalue problems. Journal of Machine Learning Research, 14 0 (Apr): 0 899--925, 2013

work page 2013

[47] [47]

The Schur complement and its applications, volume 4

Fuzhen Zhang. The Schur complement and its applications, volume 4. Springer Science & Business Media, 2006

work page 2006

[48] [48]

Sparse principal component analysis

Hui Zou, Trevor Hastie, and Robert Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15 0 (2): 0 265--286, 2006

work page 2006