pith. sign in

arxiv: 2510.08117 · v2 · submitted 2025-10-09 · 💻 cs.IT · math.IT· stat.ML

Near-optimal Rank Adaptive Inference of High Dimensional Matrices

Pith reviewed 2026-05-18 08:44 UTC · model grok-4.3

classification 💻 cs.IT math.ITstat.ML
keywords high-dimensional matrix estimationrank-adaptive algorithmssingular value thresholdinginstance-specific boundslinear measurementsfinite-sample analysismultivariate regressionsystem identification
0
0 comments X

The pith

A rank-adaptive algorithm using least-squares and singular value thresholding nearly achieves the fundamental limits for estimating high-dimensional matrices from linear measurements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to create algorithms that estimate a large matrix from noisy linear measurements by deciding on the fly how many of its main components to fit precisely. It shows that the best number of components to estimate depends on the unknown matrix, the number of measurements, and the noise strength, creating a trade-off between fitting the large parts well and accepting error on the small parts. The authors derive lower bounds on the number of measurements needed for any such adaptive method and introduce an algorithm that gets close to these bounds with explicit error guarantees. This matters because it allows near-optimal performance without knowing the matrix structure ahead of time, improving results in tasks like regression and system identification.

Core claim

The paper establishes instance-specific lower bounds on sample complexity for rank-adaptive matrix estimation and proposes an algorithm combining least-squares estimation with universal singular value thresholding that attains finite-sample error bounds nearly matching these bounds.

What carries the argument

The effective rank selection procedure that adaptively balances the precision in estimating a subset of singular values against the approximation error for the remaining singular values.

If this is right

  • The optimal effective rank depends on the specific matrix, sample size, and noise level.
  • Performance nearly matches fundamental limits across different instances without additional assumptions on singular value decay.
  • This enables better estimation in applications such as multivariate regression and linear dynamical system identification.
  • Finite-sample bounds provide practical performance guarantees for finite data settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method might extend to estimating other structured objects like tensors where rank adaptation is beneficial.
  • The identified trade-off could inform adaptive procedures in related high-dimensional problems such as covariance estimation.
  • Empirical validation on real-world datasets would test if the near-optimality translates beyond theoretical settings.

Load-bearing premise

That an effective rank can be chosen from the data to realize the optimal trade-off between singular value estimation precision and tail approximation error for any matrix without stated assumptions on singular value decay rates.

What would settle it

Observe whether the proposed algorithm's estimation error exceeds the derived lower bound by a large factor for matrices with varying singular value profiles as the number of samples increases.

Figures

Figures reproduced from arXiv: 2510.08117 by Alexandre Proutiere, Fr\'ed\'eric Zheng, Yassir Jedra.

Figure 1
Figure 1. Figure 1: Frobenius error (left) and effective rank (right) vs noise level [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: depicts the behavior of both upper bounds as a function of τ . We can observe that our upper bound is indeed smaller, and stays bounded between r∥Z∥ 2 2 (corresponding to k(τ ) = r) and ∥A∥ 2 F (corresponding to k(τ ) = 0) [PITH_FULL_IMAGE:figures/full_fig_p033_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Multivariate regression: Frobenius error (left) and effective rank (right) vs noise level [PITH_FULL_IMAGE:figures/full_fig_p034_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: System identification: Frobenius error (left) and effective rank (right) vs noise level [PITH_FULL_IMAGE:figures/full_fig_p034_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance of T-LSE and its corresponding error upper bound as a function of n. 2. As a function of d (n = 5000, r = 10). We observe in [PITH_FULL_IMAGE:figures/full_fig_p035_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance of T-LSE and its corresponding error upper bound as a function of d. Overall, our error upper bound reflects the dependence of T-LSE’s performance on the system parameters, but it is not always tight. Deriving sharper bounds remains an open direction for future work. In particular, an interesting step would be to replace, in our Theorems 6.3 and 6.6, λmin(Σ) ˆ by λ H k (Σ) ˆ . E.5 Computing res… view at source ↗
read the original abstract

We address the problem of estimating a high-dimensional matrix from linear measurements, with a focus on designing optimal rank-adaptive algorithms. These algorithms infer the matrix by estimating its singular values and the corresponding singular vectors up to an effective rank, adaptively determined based on the data. We establish instance-specific lower bounds for the sample complexity of such algorithms, uncovering fundamental trade-offs in selecting the effective rank: balancing the precision of estimating a subset of singular values against the approximation cost incurred for the remaining ones. Our analysis identifies how the optimal effective rank depends on the matrix being estimated, the sample size, and the noise level. We propose an algorithm that combines a Least-Squares estimator with a universal singular value thresholding procedure. We provide finite-sample error bounds for this algorithm and demonstrate that its performance nearly matches the derived fundamental limits. Our results rely on an enhanced analysis of matrix denoising methods based on singular value thresholding. We validate our findings with applications to multivariate regression and linear dynamical system identification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript derives instance-specific lower bounds on the sample complexity of rank-adaptive matrix estimation from linear measurements, highlighting fundamental trade-offs in selecting an effective rank that balances singular-value estimation precision against tail approximation error. It proposes a least-squares estimator paired with a universal singular-value thresholding procedure, establishes finite-sample error bounds for this algorithm, and shows that the bounds nearly match the derived lower limits. The results are applied to multivariate regression and linear dynamical system identification.

Significance. If the finite-sample upper bounds are shown to hold for arbitrary singular-value profiles without additional decay assumptions, the work would be a solid contribution to adaptive high-dimensional inference: it supplies both instance-specific lower bounds and a matching algorithm with explicit constants, which is stronger than typical minimax results. The emphasis on data-driven rank selection and the enhanced SVT analysis are positive features.

major comments (1)
  1. [Finite-sample upper-bound analysis / proof of Theorem on error bound] The near-optimality claim rests on the adaptive rank choice in the universal SVT step correctly realizing the instance-specific trade-off for arbitrary singular-value profiles. The abstract states no decay assumptions; if the proof of the upper bound (likely in the section containing the finite-sample analysis) implicitly requires a minimum gap, a power-law tail, or similar condition to guarantee that the data-driven threshold selects the right effective rank, then the matching to the lower bounds does not hold in full generality. Please identify the precise location of the argument that the thresholding procedure works for general profiles and state any hidden conditions explicitly.
minor comments (1)
  1. [Algorithm section] The definition of the effective rank and the precise form of the universal thresholding parameter could be stated more explicitly in the algorithm description to aid reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and for the constructive feedback. We appreciate the recognition of the potential significance of the instance-specific lower bounds and the data-driven rank selection. We address the major comment below.

read point-by-point responses
  1. Referee: [Finite-sample upper-bound analysis / proof of Theorem on error bound] The near-optimality claim rests on the adaptive rank choice in the universal SVT step correctly realizing the instance-specific trade-off for arbitrary singular-value profiles. The abstract states no decay assumptions; if the proof of the upper bound (likely in the section containing the finite-sample analysis) implicitly requires a minimum gap, a power-law tail, or similar condition to guarantee that the data-driven threshold selects the right effective rank, then the matching to the lower bounds does not hold in full generality. Please identify the precise location of the argument that the thresholding procedure works for general profiles and state any hidden conditions explicitly.

    Authors: We thank the referee for this important clarification request. The finite-sample error bound for the LS+USVT estimator is stated as Theorem 4.1 in Section 4. The complete proof appears in Appendix C (specifically, Lemmas C.3–C.5 and the main argument in C.6). The argument establishing that the universal threshold selects the effective rank for arbitrary singular-value profiles is contained in the proof of Lemma C.4, which relies only on sub-Gaussian concentration of the singular values of the noise matrix and a uniform deviation bound that holds simultaneously for all singular values without requiring gaps, power-law decay, or any other tail condition on the true singular values. The threshold is set to a universal level (depending only on noise variance, dimensions, and sample size) that dominates the noise singular values with high probability while remaining below the smallest retained signal singular value for the instance-specific effective rank; this comparison is made directly via the triangle inequality and does not invoke additional profile assumptions. No hidden conditions are present. To make this explicit for readers, we will add a short remark immediately after the statement of Theorem 4.1 in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No significant circularity; bounds and algorithm appear independently derived from first principles

full rationale

The abstract derives instance-specific lower bounds on sample complexity by balancing estimation precision on retained singular values against tail approximation error, then proposes an LS + universal SVT algorithm whose finite-sample error bounds are shown to nearly match those limits. No equations or steps in the provided text reduce a claimed prediction or bound to a fitted parameter by construction, nor do they rely on self-citation chains or imported uniqueness theorems for the central result. The adaptive rank selection is presented as data-driven without explicit reduction to the input data in a tautological way, and the analysis of matrix denoising is described as enhanced but independent. This is the common honest case of a self-contained derivation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The effective rank selection and universal thresholding threshold are likely data-dependent quantities whose precise definition and independence from the target error are not visible.

pith-pipeline@v0.9.0 · 5703 in / 1088 out tokens · 24011 ms · 2026-05-18T08:44:18.830528+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

  1. [1]

    Estimating linear restrictions on regression coefficients for multivariate normal distributions.The Annals of Mathematical Statistics, pages 327–351, 1951

    Theodore Wilbur Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions.The Annals of Mathematical Statistics, pages 327–351, 1951

  2. [2]

    Asymptotic distribution of the reduced rank regression estimator under general conditions.The Annals of Statistics, 27(4):1141–1154, 1999

    Theodore Wilbur Anderson. Asymptotic distribution of the reduced rank regression estimator under general conditions.The Annals of Statistics, 27(4):1141–1154, 1999

  3. [3]

    A new approach to learning linear dynamical systems

    Ainesh Bakshi, Allen Liu, Ankur Moitra, and Morris Yau. A new approach to learning linear dynamical systems. InProceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 335–348, 2023

  4. [4]

    Simple relative deviation bounds for covariance and gram matrices.arXiv preprint arXiv:2410.05754, 2024

    Daniel Barzilai and Ohad Shamir. Simple relative deviation bounds for covariance and gram matrices.arXiv preprint arXiv:2410.05754, 2024

  5. [5]

    Inequalities for the gamma function.Archiv der Mathematik, 91(6):554–563, 2008

    Necdet Batir. Inequalities for the gamma function.Archiv der Mathematik, 91(6):554–563, 2008

  6. [6]

    Optimal selection of reduced rank estimators of high- dimensional matrices.The Annals of Statistics, 39, 04 2010

    Florentina Bunea, Yiyuan She, and Marten Wegkamp. Optimal selection of reduced rank estimators of high- dimensional matrices.The Annals of Statistics, 39, 04 2010

  7. [7]

    Optimal estimation and rank detection for sparse spiked covariance matrices.Probability theory and related fields, 161(3):781–815, 2015

    Tony Cai, Zongming Ma, and Yihong Wu. Optimal estimation and rank detection for sparse spiked covariance matrices.Probability theory and related fields, 161(3):781–815, 2015

  8. [8]

    Exact matrix completion via convex optimization.Commun

    Emmanuel Candès and Benjamin Recht. Exact matrix completion via convex optimization.Commun. ACM, 55(6):111–119, June 2012

  9. [9]

    Matrix completion with noise.Proceedings of the IEEE, 98(6):925–936, 2010

    Emmanuel J Candes and Yaniv Plan. Matrix completion with noise.Proceedings of the IEEE, 98(6):925–936, 2010

  10. [10]

    Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

    Emmanuel J Candes and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

  11. [11]

    Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177– 214, 2015

    Sourav Chatterjee. Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177– 214, 2015

  12. [12]

    Reduced rank regression via adaptive nuclear norm penalization

    Kun Chen, Hongbo Dong, and Kung-Sik Chan. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 100(4):901–920, 09 2013

  13. [13]

    Non asymptotic estimation lower bounds for lti state space models with cram\’er-rao and van trees.arXiv preprint arXiv:2109.08582, 2021

    Boualem Djehiche and Othmane Mazhar. Non asymptotic estimation lower bounds for lti state space models with cram\’er-rao and van trees.arXiv preprint arXiv:2109.08582, 2021

  14. [14]

    Efficient learning of hidden state lti state space models of unknown order.arXiv preprint arXiv:2202.01625, 2022

    Boualem Djehiche and Othmane Mazhar. Efficient learning of hidden state lti state space models of unknown order.arXiv preprint arXiv:2202.01625, 2022

  15. [15]

    Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences, 37(11):760–766, 1951

    Ky Fan. Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences, 37(11):760–766, 1951

  16. [16]

    Hankel matrix rank minimization with applications to system identification and realization.SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013

    Maryam Fazel, Ting Kei Pong, Defeng Sun, and Paul Tseng. Hankel matrix rank minimization with applications to system identification and realization.SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013

  17. [17]

    The optimal hard threshold for singular values is 4/ √ 3.IEEE Transactions on Information Theory, 60(8):5040–5053, 2014

    Matan Gavish and David L Donoho. The optimal hard threshold for singular values is 4/ √ 3.IEEE Transactions on Information Theory, 60(8):5040–5053, 2014

  18. [18]

    Fano’s inequality for random variables

    Sebastien Gerchinovitz, Pierre Ménard, and Gilles Stoltz. Fano’s inequality for random variables. 2020

  19. [19]

    Sphere-packing bounds in the grassmann and stiefel manifolds.IEEE Transactions on Information Theory, 51(10):3445–3456, 2005

    Oliver Henkel. Sphere-packing bounds in the grassmann and stiefel manifolds.IEEE Transactions on Information Theory, 51(10):3445–3456, 2005

  20. [20]

    Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

    Alan Julian Izenman. Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

  21. [21]

    Sample complexity lower bounds for linear system identification

    Yassir Jedra and Alexandre Proutiere. Sample complexity lower bounds for linear system identification. In2019 IEEE 58th Conference on Decision and Control (CDC), page 2676–2681. IEEE Press, 2019

  22. [22]

    Finite-time identification of linear systems: Fundamental limits and optimal algorithms.IEEE Transactions on Automatic Control, 68(5):2805–2820, 2022

    Yassir Jedra and Alexandre Proutiere. Finite-time identification of linear systems: Fundamental limits and optimal algorithms.IEEE Transactions on Automatic Control, 68(5):2805–2820, 2022. 10

  23. [23]

    Tsybakov

    Vladimir Koltchinskii, Karim Lounici, and Alexandre B. Tsybakov. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.The Annals of Statistics, 39(5):2302 – 2329, 2011

  24. [24]

    High-probability minimax lower bounds.arXiv preprint arXiv:2406.13447, 2024

    Tianyi Ma, Kabir A Verchand, and Richard J Samworth. High-probability minimax lower bounds.arXiv preprint arXiv:2406.13447, 2024

  25. [25]

    Inequalities: theory of majorization and its applications

    Albert W Marshall, Ingram Olkin, and Barry C Arnold. Inequalities: theory of majorization and its applications. 1979

  26. [26]

    Bounds on the geodesic distances on the stiefel manifold for a family of riemannian metrics.arXiv preprint arXiv:2408.07072, 2024

    Simon Mataigne, P-A Absil, and Nina Miolane. Bounds on the geodesic distances on the stiefel manifold for a family of riemannian metrics.arXiv preprint arXiv:2408.07072, 2024

  27. [27]

    a matrix inequality associated with bounds on solutions of algebraic riccati and lyapunov equation

    Takehiro Mori. Comments on" a matrix inequality associated with bounds on solutions of algebraic riccati and lyapunov equation" by jm saniuk and ib rhodes.IEEE transactions on automatic control, 33(11):1088, 1988

  28. [28]

    Wainwright

    Sahand Negahban and Martin J. Wainwright. Estimation of (near) low-rank matrices with noise and high- dimensional scaling.The Annals of Statistics, 39(2):1069 – 1097, 2011

  29. [29]

    Non-asymptotic identification of lti systems from a single trajectory

    Samet Oymak and Necmiye Ozay. Non-asymptotic identification of lti systems from a single trajectory. In2019 American control conference (ACC), pages 5655–5661. IEEE, 2019

  30. [30]

    Lecture notes in statistics Multivariate reduced-rank regression

    Gregory C Reinsel and Rajabather Palani Velu.Multivariate reduced-rank regression : theory and applications. Lecture notes in statistics Multivariate reduced-rank regression. Springer New York, New York, NY , 1998

  31. [31]

    Tsybakov

    Angelika Rohde and A. Tsybakov. Estimation of high-dimensional low-rank matrices.Annals of Statistics, 39:887–930, 2009

  32. [32]

    Finite time lti system identification.Journal of Machine Learning Research, 22(26):1–61, 2021

    Tuhin Sarkar, Alexander Rakhlin, and Munther A Dahleh. Finite time lti system identification.Journal of Machine Learning Research, 22(26):1–61, 2021

  33. [33]

    Learning linear dynamical systems with semi-parametric least squares

    Max Simchowitz, Ross Boczar, and Benjamin Recht. Learning linear dynamical systems with semi-parametric least squares. InConference on Learning Theory, pages 2714–2802. PMLR, 2019

  34. [34]

    Learning without mixing: Towards a sharp analysis of linear system identification

    Max Simchowitz, Horia Mania, Stephen Tu, Michael I Jordan, and Benjamin Recht. Learning without mixing: Towards a sharp analysis of linear system identification. InConference On Learning Theory, pages 439–473. PMLR, 2018

  35. [35]

    Finite time performance analysis of mimo systems identification.arXiv preprint arXiv:2310.11790, 2023

    Shuai Sun, Jiayun Li, and Yilin Mo. Finite time performance analysis of mimo systems identification.arXiv preprint arXiv:2310.11790, 2023

  36. [36]

    Finite sample identification of low-order lti systems via nuclear norm regularization.IEEE Open Journal of Control Systems, 1:237–254, 2022

    Yue Sun, Samet Oymak, and Maryam Fazel. Finite sample identification of low-order lti systems via nuclear norm regularization.IEEE Open Journal of Control Systems, 1:237–254, 2022

  37. [37]

    Cambridge university press, 2019

    Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

  38. [38]

    Adaptive reduced rank regression

    Qiong Wu, Felix M Wong, Yanhua Li, Zhenming Liu, and Varun Kanade. Adaptive reduced rank regression. Advances in Neural Information Processing Systems, 33:4103–4114, 2020

  39. [39]

    Optimal exact least squares rank minimization

    Shuo Xiang, Yunzhang Zhu, Xiaotong Shen, and Jieping Ye. Optimal exact least squares rank minimization. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 480–488, 2012

  40. [40]

    Dimension reduction and coefficient estimation in multivariate linear regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346, 2007

    Ming Yuan, Ali Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coefficient estimation in multivariate linear regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346, 2007

  41. [41]

    Learning low-dimensional latent dynamics from high-dimensional observations: Non-asymptotics and lower bounds,

    Yuyang Zhang, Shahriar Talebi, and Na Li. Learning low-dimensional latent dynamics from high-dimensional observations: Non-asymptotics and lower bounds.arXiv preprint arXiv:2405.06089, 2024

  42. [42]

    A tutorial on the non-asymptotic theory of system identification

    Ingvar Ziemann, Anastasios Tsiamis, Bruce Lee, Yassir Jedra, Nikolai Matni, and George J Pappas. A tutorial on the non-asymptotic theory of system identification. In2023 62nd IEEE Conference on Decision and Control (CDC), pages 8921–8939. IEEE, 2023. 11 Contents 1 Introduction 1 2 Preliminaries 2 3 Related Work 3 4 Instance-specific Sample Complexity Lowe...

  43. [43]

    Lemma A.4.There exists a universal constantc≈0.17such thatb≥e −1−log(2) 3 − 1 2 −c q k 2

    The next lemma, proved below, gives a lower bound onb. Lemma A.4.There exists a universal constantc≈0.17such thatb≥e −1−log(2) 3 − 1 2 −c q k 2 . From this lemma, we deduce that the distance between two elements of the packing satisfies: d0 =ab≥ e −1−log(2) 3 − 1 2 −c 2 √ 2 √ k. Hence we have proved that there exists a packing of size2k(2d−k) ≥2 kd with m...

  44. [44]

    More precisely, introduce for alli∈[2 rdy], Ai =A+ 2Cε√r QiW ⊤ −r,whereQ i ∈ P dy r , andC≥1is a universal constant previously defined in Lemma 4.1

    Lower bound that depends on dy.We start by applying Lemmas A.1 and A.2 to A using the confusing models fromC 1(A, r, ε). More precisely, introduce for alli∈[2 rdy], Ai =A+ 2Cε√r QiW ⊤ −r,whereQ i ∈ P dy r , andC≥1is a universal constant previously defined in Lemma 4.1. Application of Lemma A.1: We have ∀i∈[2 rdy],E Ai[LN(Ai, A)] = 1 2σ2 Tr (A−A i)⊤(A−A i)...

  45. [45]

    Application of Lemma A.1: We use the confusing models fromC 2(A, r, ε)

    Lower bound that depends on dx.This case is relevant only for the multivariate regression since in this case, dx might differ fromd y. Application of Lemma A.1: We use the confusing models fromC 2(A, r, ε). Introduce for alli∈[2 rdy], Ai =A+ 2Cε√r UrR⊤ i ,whereR i ∈ P dx r . Then EAi[LN(Ai, A)] = N 2σ2 Tr((A−A i)⊤(A−A i)Σ) = 4N C2ε2 2σ2r Tr(UrR⊤ i ΣRiU ⊤ ...

  46. [46]

    Lower bounds that depends ond y: Consider the confusing models fromC 1(A, k⋆ A,N , ε): ∀i∈[2 k⋆ A,N dy], A i =A+ 2Cε ′ q k⋆ A,N QiW ⊤ −k⋆ A,N ,whereQ i ∈ P dy k⋆ A,N . We apply Lemmas A.1 and A.2 as in the proof of Theorem 4.3 (replace r by k⋆ A,N , and ε by ε′), and derive following lower bound: ε′2 ≥ σ2 log(2) 8 log( 1 δ ) +k ⋆ A,N dy N λk⋆ A,N (Σ) . 20...

  47. [47]

    A.5 Extremal partial trace Lemma A.5.For anyk≤dand two matricesQ∈St d k(R),Σ∈R d×d symmetric positive definite, we have dX i=d−k+1 λi(Σ)≤Tr(Q ⊤ΣQ)≤ kX i=1 λi(Σ).(19) Proof

    Lower bound that depends ond x: The proof follows along the same lines as those in the proof of Theorem 4.3. A.5 Extremal partial trace Lemma A.5.For anyk≤dand two matricesQ∈St d k(R),Σ∈R d×d symmetric positive definite, we have dX i=d−k+1 λi(Σ)≤Tr(Q ⊤ΣQ)≤ kX i=1 λi(Σ).(19) Proof. The proof of this classical result, sometimes referred to asextremal partia...

  48. [48]

    The true error of T-LSE and our upper bound are close

    As a function of n (d= 50 , r= 10 ). The true error of T-LSE and our upper bound are close. Both exhibit a hyperbolic decrease towards zero as can be seen in Figure 5. Figure 5: Performance ofT-LSEand its corresponding error upper bound as a function ofn

  49. [49]

    We observe in Figure 6 that the T-LSE error is linear in d as predicted by the upper bound (and lower bound)

    As a function of d (n= 5000 , r= 10 ). We observe in Figure 6 that the T-LSE error is linear in d as predicted by the upper bound (and lower bound). For the upper bound, the observed deviation from perfect linearity arises from the degradation of the accuracy of the empirical eigenvalueλ H r (ˆΣ)asnis kept fixed. Figure 6: Performance ofT-LSEand its corre...