Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals

Jian Wang; Mengchu Xu; Yonina C. Eldar

arxiv: 2604.16435 · v1 · submitted 2026-04-06 · 📡 eess.SP · cs.IT· math.IT· math.ST· stat.TH

Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals

Mengchu Xu , Jian Wang , Yonina C. Eldar This is my paper

Pith reviewed 2026-05-10 18:41 UTC · model grok-4.3

classification 📡 eess.SP cs.ITmath.ITmath.STstat.TH

keywords sparse canonical correlation analysisadaptive algorithmspower-law decaysample complexitymulti-view dataenergy concentrationphase transitioncross-covariance matrix

0 comments

The pith

An adaptive algorithm for sparse canonical correlation analysis reaches the optimal linear sample complexity for power-law decaying signals when their combined decay rate exceeds a threshold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to close the gap between the best possible sample needs and what practical algorithms achieve in sparse canonical correlation analysis. Standard methods assume worst-case flat signals and therefore require samples that scale with the product of the two sparsity levels. By building an adaptive procedure that tracks how signal energy concentrates across the two views, the work shows that structured decay allows the sample requirement to drop back to the optimal additive scaling. A key payoff is that strong concentration in one view can offset complete flatness in the other, provided the total decay rate is large enough.

Core claim

Under power-law decay models the optimal linear sample complexity is attainable provided that the aggregate decay rate of the two views is sufficiently large. This result demonstrates that a highly concentrated signal in one view allows the model to accommodate a completely flat signal in its partner.

What carries the argument

Bilateral Spectral Energy Pursuit (Bi-SEP), a stagewise adaptive algorithm that operates directly on the cross-covariance matrix and uses a proxy refinement step to dynamically track and capture cross-view signal energy.

If this is right

The sample complexity bound becomes adaptive to the coupled energy profiles of the two views rather than being dictated by the worst-case flat profile.
A highly concentrated canonical vector in one view can be paired with a completely flat vector in the other without inflating the sample requirement.
Numerical experiments confirm the predicted improvement precisely in the structured, non-flat regimes where the theory applies.
The method bypasses the multiplicative sparsity dependence that plagues non-adaptive algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same proxy-refinement idea could be tested on other multi-view tasks such as sparse principal component analysis where energy decay is also common.
Real data sets with measured decay rates could be used to decide in advance whether the linear sample regime is reachable.
The phase-transition threshold itself may become a diagnostic tool for checking whether a given data pair satisfies the aggregate-decay condition.

Load-bearing premise

The two views exhibit structured energy concentration, such as power-law decay, that the stagewise proxy refinement step can dynamically track from the cross-covariance matrix.

What would settle it

Run Bi-SEP on synthetic pairs whose power-law exponents sum just below the predicted threshold and check whether the required number of samples jumps from linear to the worse multiplicative regime.

Figures

Figures reproduced from arXiv: 2604.16435 by Jian Wang, Mengchu Xu, Yonina C. Eldar.

**Figure 2.** Figure 2: Estimation error of u versus sample size m under different signal profiles. The shaded region denotes ±1 standard deviation. Bi-SEP exhibits a profile-adaptive sample complexity, requiring fewer samples as the signal energy becomes more concentrated. High Error Low Error Finite-sample transition 0 0.5 1 1.5 2 Decay Rate ,u 0 0.5 1 1.5 2 D e c ay R a t e , v 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 M a … view at source ↗

**Figure 4.** Figure 4: Estimation error versus kv while keeping ku = 10 fixed. While the estimation error for v naturally rises as kv grows, Bi-SEP maintains a stable error rate for u. This confirms the decoupled practical error behavior detailed in Section V-B, contrasting with TPower whose u estimate degrades. causing the estimation error for v to rise under both algorithms. However, a distinct contrast emerges in the estimati… view at source ↗

read the original abstract

Sparse Canonical Correlation Analysis (SCCA) is a fundamental statistical tool for identifying linear relationships in high-dimensional, multi-view data. While minimax theory establishes an optimal sample complexity scaling additively with the sparsity levels of the canonical vectors, computationally efficient algorithms typically suffer from a suboptimal multiplicative dependence. This computational-statistical gap is intrinsically tied to worst-case ``flat'' signal assumptions. In practice, however, multi-view signals frequently exhibit structured energy concentration, such as a power-law decay. To exploit this structural concentration and bypass the worst-case bottleneck, we propose Bilateral Spectral Energy Pursuit (Bi-SEP). Operating directly on the cross-covariance matrix, Bi-SEP is a stagewise adaptive algorithm that utilizes a proxy refinement step to dynamically track and capture cross-view signal energy. Theoretically, we establish a profile-adaptive sample complexity bound governed by the coupled energy profiles of the two views. Notably, under power-law decay models, we reveal a synergistic phase transition: the optimal linear sample complexity is attainable provided that the aggregate decay rate of the two views is sufficiently large. This result demonstrates that a highly concentrated signal in one view allows the model to accommodate a completely flat signal in its partner. Numerical experiments validate our theoretical findings, illustrating the advantages of Bi-SEP in structured, non-flat signal regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces Bi-SEP, an adaptive algorithm for sparse CCA that exploits power-law decay to reach linear sample complexity when the two views have sufficient aggregate concentration, even if one is flat.

read the letter

The main point is that this work moves past the flat-spike worst-case bounds in sparse CCA by building an algorithm that tracks actual energy concentration in the cross-covariance matrix. Bi-SEP uses a stagewise proxy refinement to adapt without knowing the decay parameters ahead of time, and the key theoretical result is a synergistic phase transition: linear sample complexity holds when the combined decay rates are large enough, allowing a concentrated view to compensate for a flat partner view. That is genuinely new relative to the standard minimax literature on flat signals. The paper does a solid job laying out the practical motivation and showing through experiments that the method improves on non-adaptive baselines in structured regimes. The bounds are stated as governed by the coupled profiles rather than reducing to a self-fitted quantity, which keeps the circularity low. The main soft spot is the modeling assumption that the signals follow power-law decay and that the proxy step reliably extracts the profiles from finite samples. If those conditions fail, the guarantees do not apply, though the paper frames the result for exactly those structured cases rather than claiming universality. Proof details are not visible in the abstract, but nothing in the description suggests internal contradictions or unstated fitting. This is useful reading for people working on high-dimensional multi-view problems in signal processing or statistics who already deal with decaying signals. It is not a universal fix but gives a clean way to exploit structure when it is present. I would bring the phase-transition claim to a reading group and would cite the result if working on adaptive CCA methods. It deserves peer review because the idea is well-motivated and the evidence presented is proportionate to the claims.

Referee Report

0 major / 2 minor

Summary. The paper introduces Bilateral Spectral Energy Pursuit (Bi-SEP), a stagewise adaptive algorithm for sparse canonical correlation analysis (SCCA) that operates directly on the cross-covariance matrix using a proxy refinement step to track coupled energy profiles of the two views. It establishes a profile-adaptive sample complexity bound and, under power-law decay models, identifies a synergistic phase transition to optimal linear sample complexity when the aggregate decay rate of the two views exceeds a threshold. This allows a highly concentrated signal in one view to accommodate a flat signal in the other. Numerical experiments are presented to validate the theoretical findings in structured regimes.

Significance. If the central theoretical results hold, the work is significant for bridging the computational-statistical gap in high-dimensional SCCA. By moving beyond worst-case flat-spike assumptions to exploit natural power-law decay structures, it achieves better sample complexity in unbalanced multi-view settings common in practice. The phase-transition result is noteworthy for its flexibility across views, and the adaptive algorithm design without explicit decay parameter knowledge adds practical value. The manuscript provides machine-checked or reproducible elements only in the numerical section; the theoretical bounds would benefit from explicit proof sketches for full verifiability.

minor comments (2)

The abstract states the phase transition and sample complexity bound but does not reference the specific theorem or section containing the derivation; adding such pointers would improve readability for readers focused on the theoretical contribution.
Notation for the energy profiles and the aggregate decay rate threshold is introduced without an early summary table or diagram; a small illustrative figure in §2 or §3 would clarify the coupled profiles before the main theorems.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive and constructive review, which recognizes the potential of Bi-SEP to bridge the computational-statistical gap in sparse CCA by exploiting power-law decay and synergistic energy concentration across views. We appreciate the recommendation for minor revision and address the sole substantive point below.

read point-by-point responses

Referee: The manuscript provides machine-checked or reproducible elements only in the numerical section; the theoretical bounds would benefit from explicit proof sketches for full verifiability.

Authors: We agree that explicit proof sketches would improve accessibility and verifiability of the main results. In the revised version, we will insert concise proof sketches immediately following the statements of the profile-adaptive sample complexity bound (Theorem 1) and the synergistic phase-transition result under power-law decay (Theorem 2). These sketches will highlight the key steps—proxy refinement error control, coupled energy profile tracking, and the aggregate decay-rate threshold—while deferring full technical details to the appendix. This addition requires no changes to the existing proofs or numerical experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper derives a profile-adaptive sample complexity bound for the proposed Bi-SEP algorithm under structured signal models like power-law decay. The bound is established through theoretical analysis of the algorithm's stagewise proxy refinement on the cross-covariance matrix, without reducing to fitted parameters or self-referential definitions. The synergistic phase transition result follows from the coupled energy profiles without circular reduction. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that real signals follow power-law energy decay profiles that can be tracked via a proxy refinement step on the cross-covariance; no free parameters are explicitly fitted in the abstract, but the decay rates themselves function as structural parameters.

free parameters (1)

aggregate decay rate threshold
The phase transition condition depends on this rate being sufficiently large; it is not derived from first principles but stated as a model parameter.

axioms (1)

domain assumption Signals exhibit structured energy concentration such as power-law decay rather than flat spikes
Invoked to bypass worst-case flat-signal assumptions and enable the adaptive tracking.

pith-pipeline@v0.9.0 · 5543 in / 1201 out tokens · 28349 ms · 2026-05-10T18:41:17.126842+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Relations between two sets of variates,

H. Hotelling, “Relations between two sets of variates,”Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936

work page 1936
[2]

Sparse canonical correla- tion analysis with application to genomic data integration,

E. Parkhomenko, D. Tritchler, and J. Beyene, “Sparse canonical correla- tion analysis with application to genomic data integration,”Stat. Appl. Genet. Mol. Biol., vol. 8, no. 1, 2009

work page 2009
[3]

Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population,

B. B. Avants, D. J. Libon, K. Rascovsky, A. Boller, C. T. McMillan, L. Massimo, H. B. Coslett, A. Chatterjee, R. G. Gross, and M. Grossman, “Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population,” NeuroImage, vol. 84, pp. 698–711, 2014

work page 2014
[4]

Sparse canonical correlation analysis from a predictive point of view,

I. Wilms and C. Croux, “Sparse canonical correlation analysis from a predictive point of view,”Biom. J., vol. 57, no. 5, pp. 834–851, 2015

work page 2015
[5]

Canonical coordinates and the geometry of inference, rate, and capacity,

L. L. Scharf and C. T. Mullis, “Canonical coordinates and the geometry of inference, rate, and capacity,”IEEE Trans. Signal Process., vol. 48, no. 3, pp. 824–831, 2000

work page 2000
[6]

Canonical correlation analysis (cca) algorithms for multiple data sets: Application to blind simo equalization,

J. V´ıa, I. Santamar´ıa, and J. P ´erez, “Canonical correlation analysis (cca) algorithms for multiple data sets: Application to blind simo equalization,” inProc. 13th Eur. Signal Process. Conf. (EUSIPCO), 2005, pp. 1–4

work page 2005
[7]

Cca for joint blind source separation of multiple datasets with application to group fMRI analysis,

Y .-O. Li, W. Wang, T. Adali, and V . D. Calhoun, “Cca for joint blind source separation of multiple datasets with application to group fMRI analysis,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP). IEEE, 2008, pp. 1837–1840

work page 2008
[8]

Canonical correlation analysis: An overview with application to learning methods,

D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlation analysis: An overview with application to learning methods,”Neural Comput., vol. 16, no. 12, pp. 2639–2664, 2004

work page 2004
[9]

On the distribution of the largest eigenvalue in principal components analysis,

I. M. Johnstone, “On the distribution of the largest eigenvalue in principal components analysis,”Ann. Statist., vol. 29, no. 2, pp. 295–327, 2001

work page 2001
[10]

A well-conditioned estimator for large- dimensional covariance matrices,

O. Ledoit and M. Wolf, “A well-conditioned estimator for large- dimensional covariance matrices,”J. Multivariate Anal., vol. 88, no. 2, pp. 365–411, 2004

work page 2004
[11]

Covariance regularization by thresholding,

P. J. Bickel and E. Levina, “Covariance regularization by thresholding,” Ann. Statist., vol. 36, no. 6, pp. 2577–2604, 2008

work page 2008
[12]

Vershynin,Introduction to the non-asymptotic analysis of random matrices

R. Vershynin,Introduction to the non-asymptotic analysis of random matrices. Cambridge University Press, 2012, p. 210–268

work page 2012
[13]

Sparse principal component analysis,

H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component analysis,”J. Comput. Graph. Statist., vol. 15, no. 2, pp. 265–286, 2006

work page 2006
[14]

Sparse cca: Adaptive estimation and computational barriers,

C. Gao, Z. Ma, and H. H. Zhou, “Sparse cca: Adaptive estimation and computational barriers,”Ann. Statist., vol. 45, no. 5, pp. 2074–2101, 2017

work page 2074
[15]

Reducibility and computational lower bounds for problems with planted sparse structure,

M. Brennan and G. Bresler, “Reducibility and computational lower bounds for problems with planted sparse structure,” inConf. Learn. Theory (COLT). PMLR, 2020, pp. 48–166

work page 2020
[16]

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis,

D. M. Witten, R. Tibshirani, and T. Hastie, “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis,”Biostat., vol. 10, no. 3, pp. 515–534, 2009

work page 2009
[17]

Truncated power method for sparse eigenvalue problems,

X.-T. Yuan and T. Zhang, “Truncated power method for sparse eigenvalue problems,”J. Mach. Learn. Res., vol. 14, pp. 899–925, 2013

work page 2013
[18]

Structured sparse canonical correlation analysis,

X. Chen, H. Liu, and J. G. Carbonell, “Structured sparse canonical correlation analysis,” inProc. Int. Conf. Artif. Intell. Stat. (AISTATS). PMLR, 2012, pp. 199–207

work page 2012
[19]

A simple and provable algorithm for sparse diagonal cca,

M. Asteris, A. Kyrillidis, O. Koyejo, and R. Poldrack, “A simple and provable algorithm for sparse diagonal cca,” inProc. Int. Conf. Mach. Learn. (ICML). PMLR, 2016, pp. 1148–1157

work page 2016
[20]

Sparse probabilistic projections,

C. Archambeau and F. Bach, “Sparse probabilistic projections,”Adv. Neural Inf. Process. Syst. (NeurIPs), vol. 21, 2008

work page 2008
[21]

A majorization- minimization approach to the sparse generalized eigenvalue problem,

B. K. Sriperumbudur, D. A. Torres, and G. R. Lanckriet, “A majorization- minimization approach to the sparse generalized eigenvalue problem,” Mach. Learn., vol. 85, no. 1, pp. 3–39, 2011

work page 2011
[22]

Low-rank matrix completion using alternating minimization,

P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” inProc. 45th Annu. ACM Symp. Theory Comput. (STOC), 2013, pp. 665–674

work page 2013
[23]

Tighten after relax: Minimax-optimal sparse pca in polynomial time,

Z. Wang, H. Lu, and H. Liu, “Tighten after relax: Minimax-optimal sparse pca in polynomial time,”Adv. Neural Inf. Process. Syst. (NeurIPs), vol. 27, 2014

work page 2014
[24]

Statistical guarantees for the EM algorithm: From population to sample-based analysis,

S. Balakrishnan, M. J. Wainwright, and B. Yu, “Statistical guarantees for the EM algorithm: From population to sample-based analysis,”Ann. Statist., vol. 45, no. 1, pp. 77 – 120, 2017

work page 2017
[25]

Relations between the statistics of natural images and the response properties of cortical cells,

D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,”J. Opt. Soc. Amer. A, vol. 4, no. 12, pp. 2379–2394, 1987

work page 1987
[26]

Mallat,A wavelet tour of signal processing

S. Mallat,A wavelet tour of signal processing. Elsevier, 1999

work page 1999
[27]

Compressed sensing,

D. L. Donoho, “Compressed sensing,”IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, 2006

work page 2006
[28]

An introduction to compressive sampling,

E. J. Cand`es and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, 2008

work page 2008
[29]

Iterative hard thresholding for compressed sensing,

T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,”Appl. Comput. Harmon. Anal., vol. 27, no. 3, pp. 265–274, 2009

work page 2009
[30]

Sparse principal component analysis and iterative thresholding,

Z. Ma, “Sparse principal component analysis and iterative thresholding,” Ann. Statist., vol. 41, no. 2, pp. 772–801, 2013

work page 2013
[31]

Orthogonal matching pursuit for sparse signal recovery with noise,

T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery with noise,”IEEE Transactions on Information Theory, vol. 57, no. 7, pp. 4680–4688, 2011

work page 2011
[32]

Sparse principal component analysis with energy profile dependent sample complexity,

M. Xu, J. Wang, and Y . C. Eldar, “Sparse principal component analysis with energy profile dependent sample complexity,” arXiv:2512.15191, 2025

work page arXiv 2025
[33]

Fast and provable algorithms for sparse PCA with improved sample complexity,

J.-F. Cai, Z. Xian, and J. Ying, “Fast and provable algorithms for sparse PCA with improved sample complexity,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 267, 13–19 Jul 2025, pp. 6319–6340

work page 2025
[34]

Perturbation bounds in connection with singular value decomposition,

P.-˚A. Wedin, “Perturbation bounds in connection with singular value decomposition,”BIT Numerical Mathematics, vol. 12, no. 1, pp. 99–111, 1972

work page 1972
[35]

Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals

G. W. Stewart and J.-g. Sun,Matrix Perturbation Theory. Academic Press, 1990. S-1 Supplementary Material for “Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals” Mengchu Xu, Jian Wang, and Yonina C. Eldar This supplementary material provides the proofs for the cited lemmas (Lemma 1 and Lemma 2) which are omitted from the main m...

work page 1990

[1] [1]

Relations between two sets of variates,

H. Hotelling, “Relations between two sets of variates,”Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936

work page 1936

[2] [2]

Sparse canonical correla- tion analysis with application to genomic data integration,

E. Parkhomenko, D. Tritchler, and J. Beyene, “Sparse canonical correla- tion analysis with application to genomic data integration,”Stat. Appl. Genet. Mol. Biol., vol. 8, no. 1, 2009

work page 2009

[3] [3]

Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population,

B. B. Avants, D. J. Libon, K. Rascovsky, A. Boller, C. T. McMillan, L. Massimo, H. B. Coslett, A. Chatterjee, R. G. Gross, and M. Grossman, “Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population,” NeuroImage, vol. 84, pp. 698–711, 2014

work page 2014

[4] [4]

Sparse canonical correlation analysis from a predictive point of view,

I. Wilms and C. Croux, “Sparse canonical correlation analysis from a predictive point of view,”Biom. J., vol. 57, no. 5, pp. 834–851, 2015

work page 2015

[5] [5]

Canonical coordinates and the geometry of inference, rate, and capacity,

L. L. Scharf and C. T. Mullis, “Canonical coordinates and the geometry of inference, rate, and capacity,”IEEE Trans. Signal Process., vol. 48, no. 3, pp. 824–831, 2000

work page 2000

[6] [6]

Canonical correlation analysis (cca) algorithms for multiple data sets: Application to blind simo equalization,

J. V´ıa, I. Santamar´ıa, and J. P ´erez, “Canonical correlation analysis (cca) algorithms for multiple data sets: Application to blind simo equalization,” inProc. 13th Eur. Signal Process. Conf. (EUSIPCO), 2005, pp. 1–4

work page 2005

[7] [7]

Cca for joint blind source separation of multiple datasets with application to group fMRI analysis,

Y .-O. Li, W. Wang, T. Adali, and V . D. Calhoun, “Cca for joint blind source separation of multiple datasets with application to group fMRI analysis,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP). IEEE, 2008, pp. 1837–1840

work page 2008

[8] [8]

Canonical correlation analysis: An overview with application to learning methods,

D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlation analysis: An overview with application to learning methods,”Neural Comput., vol. 16, no. 12, pp. 2639–2664, 2004

work page 2004

[9] [9]

On the distribution of the largest eigenvalue in principal components analysis,

I. M. Johnstone, “On the distribution of the largest eigenvalue in principal components analysis,”Ann. Statist., vol. 29, no. 2, pp. 295–327, 2001

work page 2001

[10] [10]

A well-conditioned estimator for large- dimensional covariance matrices,

O. Ledoit and M. Wolf, “A well-conditioned estimator for large- dimensional covariance matrices,”J. Multivariate Anal., vol. 88, no. 2, pp. 365–411, 2004

work page 2004

[11] [11]

Covariance regularization by thresholding,

P. J. Bickel and E. Levina, “Covariance regularization by thresholding,” Ann. Statist., vol. 36, no. 6, pp. 2577–2604, 2008

work page 2008

[12] [12]

Vershynin,Introduction to the non-asymptotic analysis of random matrices

R. Vershynin,Introduction to the non-asymptotic analysis of random matrices. Cambridge University Press, 2012, p. 210–268

work page 2012

[13] [13]

Sparse principal component analysis,

H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component analysis,”J. Comput. Graph. Statist., vol. 15, no. 2, pp. 265–286, 2006

work page 2006

[14] [14]

Sparse cca: Adaptive estimation and computational barriers,

C. Gao, Z. Ma, and H. H. Zhou, “Sparse cca: Adaptive estimation and computational barriers,”Ann. Statist., vol. 45, no. 5, pp. 2074–2101, 2017

work page 2074

[15] [15]

Reducibility and computational lower bounds for problems with planted sparse structure,

M. Brennan and G. Bresler, “Reducibility and computational lower bounds for problems with planted sparse structure,” inConf. Learn. Theory (COLT). PMLR, 2020, pp. 48–166

work page 2020

[16] [16]

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis,

D. M. Witten, R. Tibshirani, and T. Hastie, “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis,”Biostat., vol. 10, no. 3, pp. 515–534, 2009

work page 2009

[17] [17]

Truncated power method for sparse eigenvalue problems,

X.-T. Yuan and T. Zhang, “Truncated power method for sparse eigenvalue problems,”J. Mach. Learn. Res., vol. 14, pp. 899–925, 2013

work page 2013

[18] [18]

Structured sparse canonical correlation analysis,

X. Chen, H. Liu, and J. G. Carbonell, “Structured sparse canonical correlation analysis,” inProc. Int. Conf. Artif. Intell. Stat. (AISTATS). PMLR, 2012, pp. 199–207

work page 2012

[19] [19]

A simple and provable algorithm for sparse diagonal cca,

M. Asteris, A. Kyrillidis, O. Koyejo, and R. Poldrack, “A simple and provable algorithm for sparse diagonal cca,” inProc. Int. Conf. Mach. Learn. (ICML). PMLR, 2016, pp. 1148–1157

work page 2016

[20] [20]

Sparse probabilistic projections,

C. Archambeau and F. Bach, “Sparse probabilistic projections,”Adv. Neural Inf. Process. Syst. (NeurIPs), vol. 21, 2008

work page 2008

[21] [21]

A majorization- minimization approach to the sparse generalized eigenvalue problem,

B. K. Sriperumbudur, D. A. Torres, and G. R. Lanckriet, “A majorization- minimization approach to the sparse generalized eigenvalue problem,” Mach. Learn., vol. 85, no. 1, pp. 3–39, 2011

work page 2011

[22] [22]

Low-rank matrix completion using alternating minimization,

P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” inProc. 45th Annu. ACM Symp. Theory Comput. (STOC), 2013, pp. 665–674

work page 2013

[23] [23]

Tighten after relax: Minimax-optimal sparse pca in polynomial time,

Z. Wang, H. Lu, and H. Liu, “Tighten after relax: Minimax-optimal sparse pca in polynomial time,”Adv. Neural Inf. Process. Syst. (NeurIPs), vol. 27, 2014

work page 2014

[24] [24]

Statistical guarantees for the EM algorithm: From population to sample-based analysis,

S. Balakrishnan, M. J. Wainwright, and B. Yu, “Statistical guarantees for the EM algorithm: From population to sample-based analysis,”Ann. Statist., vol. 45, no. 1, pp. 77 – 120, 2017

work page 2017

[25] [25]

Relations between the statistics of natural images and the response properties of cortical cells,

D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,”J. Opt. Soc. Amer. A, vol. 4, no. 12, pp. 2379–2394, 1987

work page 1987

[26] [26]

Mallat,A wavelet tour of signal processing

S. Mallat,A wavelet tour of signal processing. Elsevier, 1999

work page 1999

[27] [27]

Compressed sensing,

D. L. Donoho, “Compressed sensing,”IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, 2006

work page 2006

[28] [28]

An introduction to compressive sampling,

E. J. Cand`es and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, 2008

work page 2008

[29] [29]

Iterative hard thresholding for compressed sensing,

T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,”Appl. Comput. Harmon. Anal., vol. 27, no. 3, pp. 265–274, 2009

work page 2009

[30] [30]

Sparse principal component analysis and iterative thresholding,

Z. Ma, “Sparse principal component analysis and iterative thresholding,” Ann. Statist., vol. 41, no. 2, pp. 772–801, 2013

work page 2013

[31] [31]

Orthogonal matching pursuit for sparse signal recovery with noise,

T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery with noise,”IEEE Transactions on Information Theory, vol. 57, no. 7, pp. 4680–4688, 2011

work page 2011

[32] [32]

Sparse principal component analysis with energy profile dependent sample complexity,

M. Xu, J. Wang, and Y . C. Eldar, “Sparse principal component analysis with energy profile dependent sample complexity,” arXiv:2512.15191, 2025

work page arXiv 2025

[33] [33]

Fast and provable algorithms for sparse PCA with improved sample complexity,

J.-F. Cai, Z. Xian, and J. Ying, “Fast and provable algorithms for sparse PCA with improved sample complexity,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 267, 13–19 Jul 2025, pp. 6319–6340

work page 2025

[34] [34]

Perturbation bounds in connection with singular value decomposition,

P.-˚A. Wedin, “Perturbation bounds in connection with singular value decomposition,”BIT Numerical Mathematics, vol. 12, no. 1, pp. 99–111, 1972

work page 1972

[35] [35]

Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals

G. W. Stewart and J.-g. Sun,Matrix Perturbation Theory. Academic Press, 1990. S-1 Supplementary Material for “Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals” Mengchu Xu, Jian Wang, and Yonina C. Eldar This supplementary material provides the proofs for the cited lemmas (Lemma 1 and Lemma 2) which are omitted from the main m...

work page 1990