Near-optimal Rank Adaptive Inference of High Dimensional Matrices

Alexandre Proutiere; Fr\'ed\'eric Zheng; Yassir Jedra

arxiv: 2510.08117 · v2 · submitted 2025-10-09 · 💻 cs.IT · math.IT· stat.ML

Near-optimal Rank Adaptive Inference of High Dimensional Matrices

Fr\'ed\'eric Zheng , Yassir Jedra , Alexandre Proutiere This is my paper

Pith reviewed 2026-05-18 08:44 UTC · model grok-4.3

classification 💻 cs.IT math.ITstat.ML

keywords high-dimensional matrix estimationrank-adaptive algorithmssingular value thresholdinginstance-specific boundslinear measurementsfinite-sample analysismultivariate regressionsystem identification

0 comments

The pith

A rank-adaptive algorithm using least-squares and singular value thresholding nearly achieves the fundamental limits for estimating high-dimensional matrices from linear measurements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to create algorithms that estimate a large matrix from noisy linear measurements by deciding on the fly how many of its main components to fit precisely. It shows that the best number of components to estimate depends on the unknown matrix, the number of measurements, and the noise strength, creating a trade-off between fitting the large parts well and accepting error on the small parts. The authors derive lower bounds on the number of measurements needed for any such adaptive method and introduce an algorithm that gets close to these bounds with explicit error guarantees. This matters because it allows near-optimal performance without knowing the matrix structure ahead of time, improving results in tasks like regression and system identification.

Core claim

The paper establishes instance-specific lower bounds on sample complexity for rank-adaptive matrix estimation and proposes an algorithm combining least-squares estimation with universal singular value thresholding that attains finite-sample error bounds nearly matching these bounds.

What carries the argument

The effective rank selection procedure that adaptively balances the precision in estimating a subset of singular values against the approximation error for the remaining singular values.

If this is right

The optimal effective rank depends on the specific matrix, sample size, and noise level.
Performance nearly matches fundamental limits across different instances without additional assumptions on singular value decay.
This enables better estimation in applications such as multivariate regression and linear dynamical system identification.
Finite-sample bounds provide practical performance guarantees for finite data settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method might extend to estimating other structured objects like tensors where rank adaptation is beneficial.
The identified trade-off could inform adaptive procedures in related high-dimensional problems such as covariance estimation.
Empirical validation on real-world datasets would test if the near-optimality translates beyond theoretical settings.

Load-bearing premise

That an effective rank can be chosen from the data to realize the optimal trade-off between singular value estimation precision and tail approximation error for any matrix without stated assumptions on singular value decay rates.

What would settle it

Observe whether the proposed algorithm's estimation error exceeds the derived lower bound by a large factor for matrices with varying singular value profiles as the number of samples increases.

Figures

Figures reproduced from arXiv: 2510.08117 by Alexandre Proutiere, Fr\'ed\'eric Zheng, Yassir Jedra.

**Figure 2.** Figure 2: depicts the behavior of both upper bounds as a function of τ . We can observe that our upper bound is indeed smaller, and stays bounded between r∥Z∥ 2 2 (corresponding to k(τ ) = r) and ∥A∥ 2 F (corresponding to k(τ ) = 0) [PITH_FULL_IMAGE:figures/full_fig_p033_2.png] view at source ↗

**Figure 3.** Figure 3: Multivariate regression: Frobenius error (left) and effective rank (right) vs noise level [PITH_FULL_IMAGE:figures/full_fig_p034_3.png] view at source ↗

**Figure 4.** Figure 4: System identification: Frobenius error (left) and effective rank (right) vs noise level [PITH_FULL_IMAGE:figures/full_fig_p034_4.png] view at source ↗

**Figure 5.** Figure 5: Performance of T-LSE and its corresponding error upper bound as a function of n. 2. As a function of d (n = 5000, r = 10). We observe in [PITH_FULL_IMAGE:figures/full_fig_p035_5.png] view at source ↗

**Figure 6.** Figure 6: Performance of T-LSE and its corresponding error upper bound as a function of d. Overall, our error upper bound reflects the dependence of T-LSE’s performance on the system parameters, but it is not always tight. Deriving sharper bounds remains an open direction for future work. In particular, an interesting step would be to replace, in our Theorems 6.3 and 6.6, λmin(Σ) ˆ by λ H k (Σ) ˆ . E.5 Computing res… view at source ↗

read the original abstract

We address the problem of estimating a high-dimensional matrix from linear measurements, with a focus on designing optimal rank-adaptive algorithms. These algorithms infer the matrix by estimating its singular values and the corresponding singular vectors up to an effective rank, adaptively determined based on the data. We establish instance-specific lower bounds for the sample complexity of such algorithms, uncovering fundamental trade-offs in selecting the effective rank: balancing the precision of estimating a subset of singular values against the approximation cost incurred for the remaining ones. Our analysis identifies how the optimal effective rank depends on the matrix being estimated, the sample size, and the noise level. We propose an algorithm that combines a Least-Squares estimator with a universal singular value thresholding procedure. We provide finite-sample error bounds for this algorithm and demonstrate that its performance nearly matches the derived fundamental limits. Our results rely on an enhanced analysis of matrix denoising methods based on singular value thresholding. We validate our findings with applications to multivariate regression and linear dynamical system identification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives instance-specific lower bounds on sample complexity for rank-adaptive matrix recovery and shows a least-squares plus universal SVT algorithm that nearly matches them.

read the letter

The core point is that this work derives instance-specific lower bounds that capture the trade-off in choosing an effective rank: how many singular values to estimate precisely versus the approximation error from the tail. It then gives a concrete algorithm that comes close to those limits in finite samples. That combination looks new compared with the usual fixed-rank or non-adaptive results in the matrix completion literature they cite. The finite-sample error bounds and the enhanced analysis of singular value thresholding are the parts that stand out as solid contributions. The applications to multivariate regression and linear dynamical system identification are a reasonable way to show relevance. The bounds make clear that the optimal effective rank depends on the matrix itself, the sample size, and the noise level, which is a useful framing. On the soft spots, the near-optimality claim rests on the data-driven rank choice working well for arbitrary singular-value profiles. The abstract states no decay assumptions, so if the full proofs turn out to need a minimum gap or power-law tail to guarantee the thresholding adapts correctly, the result would be less general than advertised. I would want to check the exact conditions in the theorems and whether the error bars or post-selection effects are handled cleanly. This paper is aimed at people working on high-dimensional statistics and system identification who need adaptive rank methods rather than assuming a fixed rank in advance. A reader who cares about instance-optimal rates and practical algorithms for matrix denoising would find the trade-off analysis and the matching upper bounds worth reading. It deserves a serious referee because the claims are specific, the approach is grounded in existing denoising tools, and the potential gap on assumptions is checkable rather than fatal.

Referee Report

1 major / 1 minor

Summary. The manuscript derives instance-specific lower bounds on the sample complexity of rank-adaptive matrix estimation from linear measurements, highlighting fundamental trade-offs in selecting an effective rank that balances singular-value estimation precision against tail approximation error. It proposes a least-squares estimator paired with a universal singular-value thresholding procedure, establishes finite-sample error bounds for this algorithm, and shows that the bounds nearly match the derived lower limits. The results are applied to multivariate regression and linear dynamical system identification.

Significance. If the finite-sample upper bounds are shown to hold for arbitrary singular-value profiles without additional decay assumptions, the work would be a solid contribution to adaptive high-dimensional inference: it supplies both instance-specific lower bounds and a matching algorithm with explicit constants, which is stronger than typical minimax results. The emphasis on data-driven rank selection and the enhanced SVT analysis are positive features.

major comments (1)

[Finite-sample upper-bound analysis / proof of Theorem on error bound] The near-optimality claim rests on the adaptive rank choice in the universal SVT step correctly realizing the instance-specific trade-off for arbitrary singular-value profiles. The abstract states no decay assumptions; if the proof of the upper bound (likely in the section containing the finite-sample analysis) implicitly requires a minimum gap, a power-law tail, or similar condition to guarantee that the data-driven threshold selects the right effective rank, then the matching to the lower bounds does not hold in full generality. Please identify the precise location of the argument that the thresholding procedure works for general profiles and state any hidden conditions explicitly.

minor comments (1)

[Algorithm section] The definition of the effective rank and the precise form of the universal thresholding parameter could be stated more explicitly in the algorithm description to aid reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and for the constructive feedback. We appreciate the recognition of the potential significance of the instance-specific lower bounds and the data-driven rank selection. We address the major comment below.

read point-by-point responses

Referee: [Finite-sample upper-bound analysis / proof of Theorem on error bound] The near-optimality claim rests on the adaptive rank choice in the universal SVT step correctly realizing the instance-specific trade-off for arbitrary singular-value profiles. The abstract states no decay assumptions; if the proof of the upper bound (likely in the section containing the finite-sample analysis) implicitly requires a minimum gap, a power-law tail, or similar condition to guarantee that the data-driven threshold selects the right effective rank, then the matching to the lower bounds does not hold in full generality. Please identify the precise location of the argument that the thresholding procedure works for general profiles and state any hidden conditions explicitly.

Authors: We thank the referee for this important clarification request. The finite-sample error bound for the LS+USVT estimator is stated as Theorem 4.1 in Section 4. The complete proof appears in Appendix C (specifically, Lemmas C.3–C.5 and the main argument in C.6). The argument establishing that the universal threshold selects the effective rank for arbitrary singular-value profiles is contained in the proof of Lemma C.4, which relies only on sub-Gaussian concentration of the singular values of the noise matrix and a uniform deviation bound that holds simultaneously for all singular values without requiring gaps, power-law decay, or any other tail condition on the true singular values. The threshold is set to a universal level (depending only on noise variance, dimensions, and sample size) that dominates the noise singular values with high probability while remaining below the smallest retained signal singular value for the instance-specific effective rank; this comparison is made directly via the triangle inequality and does not invoke additional profile assumptions. No hidden conditions are present. To make this explicit for readers, we will add a short remark immediately after the statement of Theorem 4.1 in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No significant circularity; bounds and algorithm appear independently derived from first principles

full rationale

The abstract derives instance-specific lower bounds on sample complexity by balancing estimation precision on retained singular values against tail approximation error, then proposes an LS + universal SVT algorithm whose finite-sample error bounds are shown to nearly match those limits. No equations or steps in the provided text reduce a claimed prediction or bound to a fitted parameter by construction, nor do they rely on self-citation chains or imported uniqueness theorems for the central result. The adaptive rank selection is presented as data-driven without explicit reduction to the input data in a tautological way, and the analysis of matrix denoising is described as enhanced but independent. This is the common honest case of a self-contained derivation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The effective rank selection and universal thresholding threshold are likely data-dependent quantities whose precise definition and independence from the target error are not visible.

pith-pipeline@v0.9.0 · 5703 in / 1088 out tokens · 24011 ms · 2026-05-18T08:44:18.830528+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose an algorithm that combines a Least-Squares estimator with a universal singular value thresholding procedure... min_k [k σ² (d̄ + log(1/δ))/(n λ_min(Σ)) + Σ_{i>k} s_i²(A)]
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

instance-specific lower bounds... k⋆_{A,n} := arg min_k [ErrReg(k,n,Σ) + Σ_{i>k} s_i²(A)]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

[1]

Estimating linear restrictions on regression coefficients for multivariate normal distributions.The Annals of Mathematical Statistics, pages 327–351, 1951

Theodore Wilbur Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions.The Annals of Mathematical Statistics, pages 327–351, 1951

work page 1951
[2]

Asymptotic distribution of the reduced rank regression estimator under general conditions.The Annals of Statistics, 27(4):1141–1154, 1999

Theodore Wilbur Anderson. Asymptotic distribution of the reduced rank regression estimator under general conditions.The Annals of Statistics, 27(4):1141–1154, 1999

work page 1999
[3]

A new approach to learning linear dynamical systems

Ainesh Bakshi, Allen Liu, Ankur Moitra, and Morris Yau. A new approach to learning linear dynamical systems. InProceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 335–348, 2023

work page 2023
[4]

Simple relative deviation bounds for covariance and gram matrices.arXiv preprint arXiv:2410.05754, 2024

Daniel Barzilai and Ohad Shamir. Simple relative deviation bounds for covariance and gram matrices.arXiv preprint arXiv:2410.05754, 2024

work page arXiv 2024
[5]

Inequalities for the gamma function.Archiv der Mathematik, 91(6):554–563, 2008

Necdet Batir. Inequalities for the gamma function.Archiv der Mathematik, 91(6):554–563, 2008

work page 2008
[6]

Optimal selection of reduced rank estimators of high- dimensional matrices.The Annals of Statistics, 39, 04 2010

Florentina Bunea, Yiyuan She, and Marten Wegkamp. Optimal selection of reduced rank estimators of high- dimensional matrices.The Annals of Statistics, 39, 04 2010

work page 2010
[7]

Optimal estimation and rank detection for sparse spiked covariance matrices.Probability theory and related fields, 161(3):781–815, 2015

Tony Cai, Zongming Ma, and Yihong Wu. Optimal estimation and rank detection for sparse spiked covariance matrices.Probability theory and related fields, 161(3):781–815, 2015

work page 2015
[8]

Exact matrix completion via convex optimization.Commun

Emmanuel Candès and Benjamin Recht. Exact matrix completion via convex optimization.Commun. ACM, 55(6):111–119, June 2012

work page 2012
[9]

Matrix completion with noise.Proceedings of the IEEE, 98(6):925–936, 2010

Emmanuel J Candes and Yaniv Plan. Matrix completion with noise.Proceedings of the IEEE, 98(6):925–936, 2010

work page 2010
[10]

Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

Emmanuel J Candes and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

work page 2011
[11]

Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177– 214, 2015

Sourav Chatterjee. Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177– 214, 2015

work page 2015
[12]

Reduced rank regression via adaptive nuclear norm penalization

Kun Chen, Hongbo Dong, and Kung-Sik Chan. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 100(4):901–920, 09 2013

work page 2013
[13]

Non asymptotic estimation lower bounds for lti state space models with cram\’er-rao and van trees.arXiv preprint arXiv:2109.08582, 2021

Boualem Djehiche and Othmane Mazhar. Non asymptotic estimation lower bounds for lti state space models with cram\’er-rao and van trees.arXiv preprint arXiv:2109.08582, 2021

work page arXiv 2021
[14]

Efficient learning of hidden state lti state space models of unknown order.arXiv preprint arXiv:2202.01625, 2022

Boualem Djehiche and Othmane Mazhar. Efficient learning of hidden state lti state space models of unknown order.arXiv preprint arXiv:2202.01625, 2022

work page arXiv 2022
[15]

Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences, 37(11):760–766, 1951

Ky Fan. Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences, 37(11):760–766, 1951

work page 1951
[16]

Hankel matrix rank minimization with applications to system identification and realization.SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013

Maryam Fazel, Ting Kei Pong, Defeng Sun, and Paul Tseng. Hankel matrix rank minimization with applications to system identification and realization.SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013

work page 2013
[17]

The optimal hard threshold for singular values is 4/ √ 3.IEEE Transactions on Information Theory, 60(8):5040–5053, 2014

Matan Gavish and David L Donoho. The optimal hard threshold for singular values is 4/ √ 3.IEEE Transactions on Information Theory, 60(8):5040–5053, 2014

work page 2014
[18]

Fano’s inequality for random variables

Sebastien Gerchinovitz, Pierre Ménard, and Gilles Stoltz. Fano’s inequality for random variables. 2020

work page 2020
[19]

Sphere-packing bounds in the grassmann and stiefel manifolds.IEEE Transactions on Information Theory, 51(10):3445–3456, 2005

Oliver Henkel. Sphere-packing bounds in the grassmann and stiefel manifolds.IEEE Transactions on Information Theory, 51(10):3445–3456, 2005

work page 2005
[20]

Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

Alan Julian Izenman. Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

work page 1975
[21]

Sample complexity lower bounds for linear system identification

Yassir Jedra and Alexandre Proutiere. Sample complexity lower bounds for linear system identification. In2019 IEEE 58th Conference on Decision and Control (CDC), page 2676–2681. IEEE Press, 2019

work page 2019
[22]

Finite-time identification of linear systems: Fundamental limits and optimal algorithms.IEEE Transactions on Automatic Control, 68(5):2805–2820, 2022

Yassir Jedra and Alexandre Proutiere. Finite-time identification of linear systems: Fundamental limits and optimal algorithms.IEEE Transactions on Automatic Control, 68(5):2805–2820, 2022. 10

work page 2022
[23]

Tsybakov

Vladimir Koltchinskii, Karim Lounici, and Alexandre B. Tsybakov. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.The Annals of Statistics, 39(5):2302 – 2329, 2011

work page 2011
[24]

High-probability minimax lower bounds.arXiv preprint arXiv:2406.13447, 2024

Tianyi Ma, Kabir A Verchand, and Richard J Samworth. High-probability minimax lower bounds.arXiv preprint arXiv:2406.13447, 2024

work page arXiv 2024
[25]

Inequalities: theory of majorization and its applications

Albert W Marshall, Ingram Olkin, and Barry C Arnold. Inequalities: theory of majorization and its applications. 1979

work page 1979
[26]

Bounds on the geodesic distances on the stiefel manifold for a family of riemannian metrics.arXiv preprint arXiv:2408.07072, 2024

Simon Mataigne, P-A Absil, and Nina Miolane. Bounds on the geodesic distances on the stiefel manifold for a family of riemannian metrics.arXiv preprint arXiv:2408.07072, 2024

work page arXiv 2024
[27]

a matrix inequality associated with bounds on solutions of algebraic riccati and lyapunov equation

Takehiro Mori. Comments on" a matrix inequality associated with bounds on solutions of algebraic riccati and lyapunov equation" by jm saniuk and ib rhodes.IEEE transactions on automatic control, 33(11):1088, 1988

work page 1988
[28]

Wainwright

Sahand Negahban and Martin J. Wainwright. Estimation of (near) low-rank matrices with noise and high- dimensional scaling.The Annals of Statistics, 39(2):1069 – 1097, 2011

work page 2011
[29]

Non-asymptotic identification of lti systems from a single trajectory

Samet Oymak and Necmiye Ozay. Non-asymptotic identification of lti systems from a single trajectory. In2019 American control conference (ACC), pages 5655–5661. IEEE, 2019

work page 2019
[30]

Lecture notes in statistics Multivariate reduced-rank regression

Gregory C Reinsel and Rajabather Palani Velu.Multivariate reduced-rank regression : theory and applications. Lecture notes in statistics Multivariate reduced-rank regression. Springer New York, New York, NY , 1998

work page 1998
[31]

Tsybakov

Angelika Rohde and A. Tsybakov. Estimation of high-dimensional low-rank matrices.Annals of Statistics, 39:887–930, 2009

work page 2009
[32]

Finite time lti system identification.Journal of Machine Learning Research, 22(26):1–61, 2021

Tuhin Sarkar, Alexander Rakhlin, and Munther A Dahleh. Finite time lti system identification.Journal of Machine Learning Research, 22(26):1–61, 2021

work page 2021
[33]

Learning linear dynamical systems with semi-parametric least squares

Max Simchowitz, Ross Boczar, and Benjamin Recht. Learning linear dynamical systems with semi-parametric least squares. InConference on Learning Theory, pages 2714–2802. PMLR, 2019

work page 2019
[34]

Learning without mixing: Towards a sharp analysis of linear system identification

Max Simchowitz, Horia Mania, Stephen Tu, Michael I Jordan, and Benjamin Recht. Learning without mixing: Towards a sharp analysis of linear system identification. InConference On Learning Theory, pages 439–473. PMLR, 2018

work page 2018
[35]

Finite time performance analysis of mimo systems identification.arXiv preprint arXiv:2310.11790, 2023

Shuai Sun, Jiayun Li, and Yilin Mo. Finite time performance analysis of mimo systems identification.arXiv preprint arXiv:2310.11790, 2023

work page arXiv 2023
[36]

Finite sample identification of low-order lti systems via nuclear norm regularization.IEEE Open Journal of Control Systems, 1:237–254, 2022

Yue Sun, Samet Oymak, and Maryam Fazel. Finite sample identification of low-order lti systems via nuclear norm regularization.IEEE Open Journal of Control Systems, 1:237–254, 2022

work page 2022
[37]

Cambridge university press, 2019

Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

work page 2019
[38]

Adaptive reduced rank regression

Qiong Wu, Felix M Wong, Yanhua Li, Zhenming Liu, and Varun Kanade. Adaptive reduced rank regression. Advances in Neural Information Processing Systems, 33:4103–4114, 2020

work page 2020
[39]

Optimal exact least squares rank minimization

Shuo Xiang, Yunzhang Zhu, Xiaotong Shen, and Jieping Ye. Optimal exact least squares rank minimization. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 480–488, 2012

work page 2012
[40]

Dimension reduction and coefficient estimation in multivariate linear regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346, 2007

Ming Yuan, Ali Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coefficient estimation in multivariate linear regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346, 2007

work page 2007
[41]

Learning low-dimensional latent dynamics from high-dimensional observations: Non-asymptotics and lower bounds,

Yuyang Zhang, Shahriar Talebi, and Na Li. Learning low-dimensional latent dynamics from high-dimensional observations: Non-asymptotics and lower bounds.arXiv preprint arXiv:2405.06089, 2024

work page arXiv 2024
[42]

A tutorial on the non-asymptotic theory of system identification

Ingvar Ziemann, Anastasios Tsiamis, Bruce Lee, Yassir Jedra, Nikolai Matni, and George J Pappas. A tutorial on the non-asymptotic theory of system identification. In2023 62nd IEEE Conference on Decision and Control (CDC), pages 8921–8939. IEEE, 2023. 11 Contents 1 Introduction 1 2 Preliminaries 2 3 Related Work 3 4 Instance-specific Sample Complexity Lowe...

work page 2023
[43]

Lemma A.4.There exists a universal constantc≈0.17such thatb≥e −1−log(2) 3 − 1 2 −c q k 2

The next lemma, proved below, gives a lower bound onb. Lemma A.4.There exists a universal constantc≈0.17such thatb≥e −1−log(2) 3 − 1 2 −c q k 2 . From this lemma, we deduce that the distance between two elements of the packing satisfies: d0 =ab≥ e −1−log(2) 3 − 1 2 −c 2 √ 2 √ k. Hence we have proved that there exists a packing of size2k(2d−k) ≥2 kd with m...

work page
[44]

More precisely, introduce for alli∈[2 rdy], Ai =A+ 2Cε√r QiW ⊤ −r,whereQ i ∈ P dy r , andC≥1is a universal constant previously defined in Lemma 4.1

Lower bound that depends on dy.We start by applying Lemmas A.1 and A.2 to A using the confusing models fromC 1(A, r, ε). More precisely, introduce for alli∈[2 rdy], Ai =A+ 2Cε√r QiW ⊤ −r,whereQ i ∈ P dy r , andC≥1is a universal constant previously defined in Lemma 4.1. Application of Lemma A.1: We have ∀i∈[2 rdy],E Ai[LN(Ai, A)] = 1 2σ2 Tr (A−A i)⊤(A−A i)...

work page
[45]

Application of Lemma A.1: We use the confusing models fromC 2(A, r, ε)

Lower bound that depends on dx.This case is relevant only for the multivariate regression since in this case, dx might differ fromd y. Application of Lemma A.1: We use the confusing models fromC 2(A, r, ε). Introduce for alli∈[2 rdy], Ai =A+ 2Cε√r UrR⊤ i ,whereR i ∈ P dx r . Then EAi[LN(Ai, A)] = N 2σ2 Tr((A−A i)⊤(A−A i)Σ) = 4N C2ε2 2σ2r Tr(UrR⊤ i ΣRiU ⊤ ...

work page
[46]

Lower bounds that depends ond y: Consider the confusing models fromC 1(A, k⋆ A,N , ε): ∀i∈[2 k⋆ A,N dy], A i =A+ 2Cε ′ q k⋆ A,N QiW ⊤ −k⋆ A,N ,whereQ i ∈ P dy k⋆ A,N . We apply Lemmas A.1 and A.2 as in the proof of Theorem 4.3 (replace r by k⋆ A,N , and ε by ε′), and derive following lower bound: ε′2 ≥ σ2 log(2) 8 log( 1 δ ) +k ⋆ A,N dy N λk⋆ A,N (Σ) . 20...

work page
[47]

A.5 Extremal partial trace Lemma A.5.For anyk≤dand two matricesQ∈St d k(R),Σ∈R d×d symmetric positive definite, we have dX i=d−k+1 λi(Σ)≤Tr(Q ⊤ΣQ)≤ kX i=1 λi(Σ).(19) Proof

Lower bound that depends ond x: The proof follows along the same lines as those in the proof of Theorem 4.3. A.5 Extremal partial trace Lemma A.5.For anyk≤dand two matricesQ∈St d k(R),Σ∈R d×d symmetric positive definite, we have dX i=d−k+1 λi(Σ)≤Tr(Q ⊤ΣQ)≤ kX i=1 λi(Σ).(19) Proof. The proof of this classical result, sometimes referred to asextremal partia...

work page
[48]

The true error of T-LSE and our upper bound are close

As a function of n (d= 50 , r= 10 ). The true error of T-LSE and our upper bound are close. Both exhibit a hyperbolic decrease towards zero as can be seen in Figure 5. Figure 5: Performance ofT-LSEand its corresponding error upper bound as a function ofn

work page
[49]

We observe in Figure 6 that the T-LSE error is linear in d as predicted by the upper bound (and lower bound)

As a function of d (n= 5000 , r= 10 ). We observe in Figure 6 that the T-LSE error is linear in d as predicted by the upper bound (and lower bound). For the upper bound, the observed deviation from perfect linearity arises from the degradation of the accuracy of the empirical eigenvalueλ H r (ˆΣ)asnis kept fixed. Figure 6: Performance ofT-LSEand its corre...

work page

[1] [1]

Estimating linear restrictions on regression coefficients for multivariate normal distributions.The Annals of Mathematical Statistics, pages 327–351, 1951

Theodore Wilbur Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions.The Annals of Mathematical Statistics, pages 327–351, 1951

work page 1951

[2] [2]

Asymptotic distribution of the reduced rank regression estimator under general conditions.The Annals of Statistics, 27(4):1141–1154, 1999

Theodore Wilbur Anderson. Asymptotic distribution of the reduced rank regression estimator under general conditions.The Annals of Statistics, 27(4):1141–1154, 1999

work page 1999

[3] [3]

A new approach to learning linear dynamical systems

Ainesh Bakshi, Allen Liu, Ankur Moitra, and Morris Yau. A new approach to learning linear dynamical systems. InProceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 335–348, 2023

work page 2023

[4] [4]

Simple relative deviation bounds for covariance and gram matrices.arXiv preprint arXiv:2410.05754, 2024

Daniel Barzilai and Ohad Shamir. Simple relative deviation bounds for covariance and gram matrices.arXiv preprint arXiv:2410.05754, 2024

work page arXiv 2024

[5] [5]

Inequalities for the gamma function.Archiv der Mathematik, 91(6):554–563, 2008

Necdet Batir. Inequalities for the gamma function.Archiv der Mathematik, 91(6):554–563, 2008

work page 2008

[6] [6]

Optimal selection of reduced rank estimators of high- dimensional matrices.The Annals of Statistics, 39, 04 2010

Florentina Bunea, Yiyuan She, and Marten Wegkamp. Optimal selection of reduced rank estimators of high- dimensional matrices.The Annals of Statistics, 39, 04 2010

work page 2010

[7] [7]

Optimal estimation and rank detection for sparse spiked covariance matrices.Probability theory and related fields, 161(3):781–815, 2015

Tony Cai, Zongming Ma, and Yihong Wu. Optimal estimation and rank detection for sparse spiked covariance matrices.Probability theory and related fields, 161(3):781–815, 2015

work page 2015

[8] [8]

Exact matrix completion via convex optimization.Commun

Emmanuel Candès and Benjamin Recht. Exact matrix completion via convex optimization.Commun. ACM, 55(6):111–119, June 2012

work page 2012

[9] [9]

Matrix completion with noise.Proceedings of the IEEE, 98(6):925–936, 2010

Emmanuel J Candes and Yaniv Plan. Matrix completion with noise.Proceedings of the IEEE, 98(6):925–936, 2010

work page 2010

[10] [10]

Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

Emmanuel J Candes and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011

work page 2011

[11] [11]

Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177– 214, 2015

Sourav Chatterjee. Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177– 214, 2015

work page 2015

[12] [12]

Reduced rank regression via adaptive nuclear norm penalization

Kun Chen, Hongbo Dong, and Kung-Sik Chan. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 100(4):901–920, 09 2013

work page 2013

[13] [13]

Non asymptotic estimation lower bounds for lti state space models with cram\’er-rao and van trees.arXiv preprint arXiv:2109.08582, 2021

Boualem Djehiche and Othmane Mazhar. Non asymptotic estimation lower bounds for lti state space models with cram\’er-rao and van trees.arXiv preprint arXiv:2109.08582, 2021

work page arXiv 2021

[14] [14]

Efficient learning of hidden state lti state space models of unknown order.arXiv preprint arXiv:2202.01625, 2022

Boualem Djehiche and Othmane Mazhar. Efficient learning of hidden state lti state space models of unknown order.arXiv preprint arXiv:2202.01625, 2022

work page arXiv 2022

[15] [15]

Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences, 37(11):760–766, 1951

Ky Fan. Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences, 37(11):760–766, 1951

work page 1951

[16] [16]

Hankel matrix rank minimization with applications to system identification and realization.SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013

Maryam Fazel, Ting Kei Pong, Defeng Sun, and Paul Tseng. Hankel matrix rank minimization with applications to system identification and realization.SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013

work page 2013

[17] [17]

The optimal hard threshold for singular values is 4/ √ 3.IEEE Transactions on Information Theory, 60(8):5040–5053, 2014

Matan Gavish and David L Donoho. The optimal hard threshold for singular values is 4/ √ 3.IEEE Transactions on Information Theory, 60(8):5040–5053, 2014

work page 2014

[18] [18]

Fano’s inequality for random variables

Sebastien Gerchinovitz, Pierre Ménard, and Gilles Stoltz. Fano’s inequality for random variables. 2020

work page 2020

[19] [19]

Sphere-packing bounds in the grassmann and stiefel manifolds.IEEE Transactions on Information Theory, 51(10):3445–3456, 2005

Oliver Henkel. Sphere-packing bounds in the grassmann and stiefel manifolds.IEEE Transactions on Information Theory, 51(10):3445–3456, 2005

work page 2005

[20] [20]

Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

Alan Julian Izenman. Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975

work page 1975

[21] [21]

Sample complexity lower bounds for linear system identification

Yassir Jedra and Alexandre Proutiere. Sample complexity lower bounds for linear system identification. In2019 IEEE 58th Conference on Decision and Control (CDC), page 2676–2681. IEEE Press, 2019

work page 2019

[22] [22]

Finite-time identification of linear systems: Fundamental limits and optimal algorithms.IEEE Transactions on Automatic Control, 68(5):2805–2820, 2022

Yassir Jedra and Alexandre Proutiere. Finite-time identification of linear systems: Fundamental limits and optimal algorithms.IEEE Transactions on Automatic Control, 68(5):2805–2820, 2022. 10

work page 2022

[23] [23]

Tsybakov

Vladimir Koltchinskii, Karim Lounici, and Alexandre B. Tsybakov. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.The Annals of Statistics, 39(5):2302 – 2329, 2011

work page 2011

[24] [24]

High-probability minimax lower bounds.arXiv preprint arXiv:2406.13447, 2024

Tianyi Ma, Kabir A Verchand, and Richard J Samworth. High-probability minimax lower bounds.arXiv preprint arXiv:2406.13447, 2024

work page arXiv 2024

[25] [25]

Inequalities: theory of majorization and its applications

Albert W Marshall, Ingram Olkin, and Barry C Arnold. Inequalities: theory of majorization and its applications. 1979

work page 1979

[26] [26]

Bounds on the geodesic distances on the stiefel manifold for a family of riemannian metrics.arXiv preprint arXiv:2408.07072, 2024

Simon Mataigne, P-A Absil, and Nina Miolane. Bounds on the geodesic distances on the stiefel manifold for a family of riemannian metrics.arXiv preprint arXiv:2408.07072, 2024

work page arXiv 2024

[27] [27]

a matrix inequality associated with bounds on solutions of algebraic riccati and lyapunov equation

Takehiro Mori. Comments on" a matrix inequality associated with bounds on solutions of algebraic riccati and lyapunov equation" by jm saniuk and ib rhodes.IEEE transactions on automatic control, 33(11):1088, 1988

work page 1988

[28] [28]

Wainwright

Sahand Negahban and Martin J. Wainwright. Estimation of (near) low-rank matrices with noise and high- dimensional scaling.The Annals of Statistics, 39(2):1069 – 1097, 2011

work page 2011

[29] [29]

Non-asymptotic identification of lti systems from a single trajectory

Samet Oymak and Necmiye Ozay. Non-asymptotic identification of lti systems from a single trajectory. In2019 American control conference (ACC), pages 5655–5661. IEEE, 2019

work page 2019

[30] [30]

Lecture notes in statistics Multivariate reduced-rank regression

Gregory C Reinsel and Rajabather Palani Velu.Multivariate reduced-rank regression : theory and applications. Lecture notes in statistics Multivariate reduced-rank regression. Springer New York, New York, NY , 1998

work page 1998

[31] [31]

Tsybakov

Angelika Rohde and A. Tsybakov. Estimation of high-dimensional low-rank matrices.Annals of Statistics, 39:887–930, 2009

work page 2009

[32] [32]

Finite time lti system identification.Journal of Machine Learning Research, 22(26):1–61, 2021

Tuhin Sarkar, Alexander Rakhlin, and Munther A Dahleh. Finite time lti system identification.Journal of Machine Learning Research, 22(26):1–61, 2021

work page 2021

[33] [33]

Learning linear dynamical systems with semi-parametric least squares

Max Simchowitz, Ross Boczar, and Benjamin Recht. Learning linear dynamical systems with semi-parametric least squares. InConference on Learning Theory, pages 2714–2802. PMLR, 2019

work page 2019

[34] [34]

Learning without mixing: Towards a sharp analysis of linear system identification

Max Simchowitz, Horia Mania, Stephen Tu, Michael I Jordan, and Benjamin Recht. Learning without mixing: Towards a sharp analysis of linear system identification. InConference On Learning Theory, pages 439–473. PMLR, 2018

work page 2018

[35] [35]

Finite time performance analysis of mimo systems identification.arXiv preprint arXiv:2310.11790, 2023

Shuai Sun, Jiayun Li, and Yilin Mo. Finite time performance analysis of mimo systems identification.arXiv preprint arXiv:2310.11790, 2023

work page arXiv 2023

[36] [36]

Finite sample identification of low-order lti systems via nuclear norm regularization.IEEE Open Journal of Control Systems, 1:237–254, 2022

Yue Sun, Samet Oymak, and Maryam Fazel. Finite sample identification of low-order lti systems via nuclear norm regularization.IEEE Open Journal of Control Systems, 1:237–254, 2022

work page 2022

[37] [37]

Cambridge university press, 2019

Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

work page 2019

[38] [38]

Adaptive reduced rank regression

Qiong Wu, Felix M Wong, Yanhua Li, Zhenming Liu, and Varun Kanade. Adaptive reduced rank regression. Advances in Neural Information Processing Systems, 33:4103–4114, 2020

work page 2020

[39] [39]

Optimal exact least squares rank minimization

Shuo Xiang, Yunzhang Zhu, Xiaotong Shen, and Jieping Ye. Optimal exact least squares rank minimization. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 480–488, 2012

work page 2012

[40] [40]

Dimension reduction and coefficient estimation in multivariate linear regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346, 2007

Ming Yuan, Ali Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coefficient estimation in multivariate linear regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346, 2007

work page 2007

[41] [41]

Learning low-dimensional latent dynamics from high-dimensional observations: Non-asymptotics and lower bounds,

Yuyang Zhang, Shahriar Talebi, and Na Li. Learning low-dimensional latent dynamics from high-dimensional observations: Non-asymptotics and lower bounds.arXiv preprint arXiv:2405.06089, 2024

work page arXiv 2024

[42] [42]

A tutorial on the non-asymptotic theory of system identification

Ingvar Ziemann, Anastasios Tsiamis, Bruce Lee, Yassir Jedra, Nikolai Matni, and George J Pappas. A tutorial on the non-asymptotic theory of system identification. In2023 62nd IEEE Conference on Decision and Control (CDC), pages 8921–8939. IEEE, 2023. 11 Contents 1 Introduction 1 2 Preliminaries 2 3 Related Work 3 4 Instance-specific Sample Complexity Lowe...

work page 2023

[43] [43]

Lemma A.4.There exists a universal constantc≈0.17such thatb≥e −1−log(2) 3 − 1 2 −c q k 2

The next lemma, proved below, gives a lower bound onb. Lemma A.4.There exists a universal constantc≈0.17such thatb≥e −1−log(2) 3 − 1 2 −c q k 2 . From this lemma, we deduce that the distance between two elements of the packing satisfies: d0 =ab≥ e −1−log(2) 3 − 1 2 −c 2 √ 2 √ k. Hence we have proved that there exists a packing of size2k(2d−k) ≥2 kd with m...

work page

[44] [44]

More precisely, introduce for alli∈[2 rdy], Ai =A+ 2Cε√r QiW ⊤ −r,whereQ i ∈ P dy r , andC≥1is a universal constant previously defined in Lemma 4.1

Lower bound that depends on dy.We start by applying Lemmas A.1 and A.2 to A using the confusing models fromC 1(A, r, ε). More precisely, introduce for alli∈[2 rdy], Ai =A+ 2Cε√r QiW ⊤ −r,whereQ i ∈ P dy r , andC≥1is a universal constant previously defined in Lemma 4.1. Application of Lemma A.1: We have ∀i∈[2 rdy],E Ai[LN(Ai, A)] = 1 2σ2 Tr (A−A i)⊤(A−A i)...

work page

[45] [45]

Application of Lemma A.1: We use the confusing models fromC 2(A, r, ε)

Lower bound that depends on dx.This case is relevant only for the multivariate regression since in this case, dx might differ fromd y. Application of Lemma A.1: We use the confusing models fromC 2(A, r, ε). Introduce for alli∈[2 rdy], Ai =A+ 2Cε√r UrR⊤ i ,whereR i ∈ P dx r . Then EAi[LN(Ai, A)] = N 2σ2 Tr((A−A i)⊤(A−A i)Σ) = 4N C2ε2 2σ2r Tr(UrR⊤ i ΣRiU ⊤ ...

work page

[46] [46]

Lower bounds that depends ond y: Consider the confusing models fromC 1(A, k⋆ A,N , ε): ∀i∈[2 k⋆ A,N dy], A i =A+ 2Cε ′ q k⋆ A,N QiW ⊤ −k⋆ A,N ,whereQ i ∈ P dy k⋆ A,N . We apply Lemmas A.1 and A.2 as in the proof of Theorem 4.3 (replace r by k⋆ A,N , and ε by ε′), and derive following lower bound: ε′2 ≥ σ2 log(2) 8 log( 1 δ ) +k ⋆ A,N dy N λk⋆ A,N (Σ) . 20...

work page

[47] [47]

A.5 Extremal partial trace Lemma A.5.For anyk≤dand two matricesQ∈St d k(R),Σ∈R d×d symmetric positive definite, we have dX i=d−k+1 λi(Σ)≤Tr(Q ⊤ΣQ)≤ kX i=1 λi(Σ).(19) Proof

Lower bound that depends ond x: The proof follows along the same lines as those in the proof of Theorem 4.3. A.5 Extremal partial trace Lemma A.5.For anyk≤dand two matricesQ∈St d k(R),Σ∈R d×d symmetric positive definite, we have dX i=d−k+1 λi(Σ)≤Tr(Q ⊤ΣQ)≤ kX i=1 λi(Σ).(19) Proof. The proof of this classical result, sometimes referred to asextremal partia...

work page

[48] [48]

The true error of T-LSE and our upper bound are close

As a function of n (d= 50 , r= 10 ). The true error of T-LSE and our upper bound are close. Both exhibit a hyperbolic decrease towards zero as can be seen in Figure 5. Figure 5: Performance ofT-LSEand its corresponding error upper bound as a function ofn

work page

[49] [49]

We observe in Figure 6 that the T-LSE error is linear in d as predicted by the upper bound (and lower bound)

As a function of d (n= 5000 , r= 10 ). We observe in Figure 6 that the T-LSE error is linear in d as predicted by the upper bound (and lower bound). For the upper bound, the observed deviation from perfect linearity arises from the degradation of the accuracy of the empirical eigenvalueλ H r (ˆΣ)asnis kept fixed. Figure 6: Performance ofT-LSEand its corre...

work page