Near-optimal Rank Adaptive Inference of High Dimensional Matrices
Pith reviewed 2026-05-18 08:44 UTC · model grok-4.3
The pith
A rank-adaptive algorithm using least-squares and singular value thresholding nearly achieves the fundamental limits for estimating high-dimensional matrices from linear measurements.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes instance-specific lower bounds on sample complexity for rank-adaptive matrix estimation and proposes an algorithm combining least-squares estimation with universal singular value thresholding that attains finite-sample error bounds nearly matching these bounds.
What carries the argument
The effective rank selection procedure that adaptively balances the precision in estimating a subset of singular values against the approximation error for the remaining singular values.
If this is right
- The optimal effective rank depends on the specific matrix, sample size, and noise level.
- Performance nearly matches fundamental limits across different instances without additional assumptions on singular value decay.
- This enables better estimation in applications such as multivariate regression and linear dynamical system identification.
- Finite-sample bounds provide practical performance guarantees for finite data settings.
Where Pith is reading between the lines
- This method might extend to estimating other structured objects like tensors where rank adaptation is beneficial.
- The identified trade-off could inform adaptive procedures in related high-dimensional problems such as covariance estimation.
- Empirical validation on real-world datasets would test if the near-optimality translates beyond theoretical settings.
Load-bearing premise
That an effective rank can be chosen from the data to realize the optimal trade-off between singular value estimation precision and tail approximation error for any matrix without stated assumptions on singular value decay rates.
What would settle it
Observe whether the proposed algorithm's estimation error exceeds the derived lower bound by a large factor for matrices with varying singular value profiles as the number of samples increases.
Figures
read the original abstract
We address the problem of estimating a high-dimensional matrix from linear measurements, with a focus on designing optimal rank-adaptive algorithms. These algorithms infer the matrix by estimating its singular values and the corresponding singular vectors up to an effective rank, adaptively determined based on the data. We establish instance-specific lower bounds for the sample complexity of such algorithms, uncovering fundamental trade-offs in selecting the effective rank: balancing the precision of estimating a subset of singular values against the approximation cost incurred for the remaining ones. Our analysis identifies how the optimal effective rank depends on the matrix being estimated, the sample size, and the noise level. We propose an algorithm that combines a Least-Squares estimator with a universal singular value thresholding procedure. We provide finite-sample error bounds for this algorithm and demonstrate that its performance nearly matches the derived fundamental limits. Our results rely on an enhanced analysis of matrix denoising methods based on singular value thresholding. We validate our findings with applications to multivariate regression and linear dynamical system identification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives instance-specific lower bounds on the sample complexity of rank-adaptive matrix estimation from linear measurements, highlighting fundamental trade-offs in selecting an effective rank that balances singular-value estimation precision against tail approximation error. It proposes a least-squares estimator paired with a universal singular-value thresholding procedure, establishes finite-sample error bounds for this algorithm, and shows that the bounds nearly match the derived lower limits. The results are applied to multivariate regression and linear dynamical system identification.
Significance. If the finite-sample upper bounds are shown to hold for arbitrary singular-value profiles without additional decay assumptions, the work would be a solid contribution to adaptive high-dimensional inference: it supplies both instance-specific lower bounds and a matching algorithm with explicit constants, which is stronger than typical minimax results. The emphasis on data-driven rank selection and the enhanced SVT analysis are positive features.
major comments (1)
- [Finite-sample upper-bound analysis / proof of Theorem on error bound] The near-optimality claim rests on the adaptive rank choice in the universal SVT step correctly realizing the instance-specific trade-off for arbitrary singular-value profiles. The abstract states no decay assumptions; if the proof of the upper bound (likely in the section containing the finite-sample analysis) implicitly requires a minimum gap, a power-law tail, or similar condition to guarantee that the data-driven threshold selects the right effective rank, then the matching to the lower bounds does not hold in full generality. Please identify the precise location of the argument that the thresholding procedure works for general profiles and state any hidden conditions explicitly.
minor comments (1)
- [Algorithm section] The definition of the effective rank and the precise form of the universal thresholding parameter could be stated more explicitly in the algorithm description to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and for the constructive feedback. We appreciate the recognition of the potential significance of the instance-specific lower bounds and the data-driven rank selection. We address the major comment below.
read point-by-point responses
-
Referee: [Finite-sample upper-bound analysis / proof of Theorem on error bound] The near-optimality claim rests on the adaptive rank choice in the universal SVT step correctly realizing the instance-specific trade-off for arbitrary singular-value profiles. The abstract states no decay assumptions; if the proof of the upper bound (likely in the section containing the finite-sample analysis) implicitly requires a minimum gap, a power-law tail, or similar condition to guarantee that the data-driven threshold selects the right effective rank, then the matching to the lower bounds does not hold in full generality. Please identify the precise location of the argument that the thresholding procedure works for general profiles and state any hidden conditions explicitly.
Authors: We thank the referee for this important clarification request. The finite-sample error bound for the LS+USVT estimator is stated as Theorem 4.1 in Section 4. The complete proof appears in Appendix C (specifically, Lemmas C.3–C.5 and the main argument in C.6). The argument establishing that the universal threshold selects the effective rank for arbitrary singular-value profiles is contained in the proof of Lemma C.4, which relies only on sub-Gaussian concentration of the singular values of the noise matrix and a uniform deviation bound that holds simultaneously for all singular values without requiring gaps, power-law decay, or any other tail condition on the true singular values. The threshold is set to a universal level (depending only on noise variance, dimensions, and sample size) that dominates the noise singular values with high probability while remaining below the smallest retained signal singular value for the instance-specific effective rank; this comparison is made directly via the triangle inequality and does not invoke additional profile assumptions. No hidden conditions are present. To make this explicit for readers, we will add a short remark immediately after the statement of Theorem 4.1 in the revised manuscript. revision: partial
Circularity Check
No significant circularity; bounds and algorithm appear independently derived from first principles
full rationale
The abstract derives instance-specific lower bounds on sample complexity by balancing estimation precision on retained singular values against tail approximation error, then proposes an LS + universal SVT algorithm whose finite-sample error bounds are shown to nearly match those limits. No equations or steps in the provided text reduce a claimed prediction or bound to a fitted parameter by construction, nor do they rely on self-citation chains or imported uniqueness theorems for the central result. The adaptive rank selection is presented as data-driven without explicit reduction to the input data in a tautological way, and the analysis of matrix denoising is described as enhanced but independent. This is the common honest case of a self-contained derivation against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose an algorithm that combines a Least-Squares estimator with a universal singular value thresholding procedure... min_k [k σ² (d̄ + log(1/δ))/(n λ_min(Σ)) + Σ_{i>k} s_i²(A)]
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
instance-specific lower bounds... k⋆_{A,n} := arg min_k [ErrReg(k,n,Σ) + Σ_{i>k} s_i²(A)]
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Theodore Wilbur Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions.The Annals of Mathematical Statistics, pages 327–351, 1951
work page 1951
-
[2]
Theodore Wilbur Anderson. Asymptotic distribution of the reduced rank regression estimator under general conditions.The Annals of Statistics, 27(4):1141–1154, 1999
work page 1999
-
[3]
A new approach to learning linear dynamical systems
Ainesh Bakshi, Allen Liu, Ankur Moitra, and Morris Yau. A new approach to learning linear dynamical systems. InProceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 335–348, 2023
work page 2023
-
[4]
Daniel Barzilai and Ohad Shamir. Simple relative deviation bounds for covariance and gram matrices.arXiv preprint arXiv:2410.05754, 2024
-
[5]
Inequalities for the gamma function.Archiv der Mathematik, 91(6):554–563, 2008
Necdet Batir. Inequalities for the gamma function.Archiv der Mathematik, 91(6):554–563, 2008
work page 2008
-
[6]
Florentina Bunea, Yiyuan She, and Marten Wegkamp. Optimal selection of reduced rank estimators of high- dimensional matrices.The Annals of Statistics, 39, 04 2010
work page 2010
-
[7]
Tony Cai, Zongming Ma, and Yihong Wu. Optimal estimation and rank detection for sparse spiked covariance matrices.Probability theory and related fields, 161(3):781–815, 2015
work page 2015
-
[8]
Exact matrix completion via convex optimization.Commun
Emmanuel Candès and Benjamin Recht. Exact matrix completion via convex optimization.Commun. ACM, 55(6):111–119, June 2012
work page 2012
-
[9]
Matrix completion with noise.Proceedings of the IEEE, 98(6):925–936, 2010
Emmanuel J Candes and Yaniv Plan. Matrix completion with noise.Proceedings of the IEEE, 98(6):925–936, 2010
work page 2010
-
[10]
Emmanuel J Candes and Yaniv Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.IEEE Transactions on Information Theory, 57(4):2342–2359, 2011
work page 2011
-
[11]
Sourav Chatterjee. Matrix estimation by universal singular value thresholding.The Annals of Statistics, 43(1):177– 214, 2015
work page 2015
-
[12]
Reduced rank regression via adaptive nuclear norm penalization
Kun Chen, Hongbo Dong, and Kung-Sik Chan. Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 100(4):901–920, 09 2013
work page 2013
-
[13]
Boualem Djehiche and Othmane Mazhar. Non asymptotic estimation lower bounds for lti state space models with cram\’er-rao and van trees.arXiv preprint arXiv:2109.08582, 2021
-
[14]
Boualem Djehiche and Othmane Mazhar. Efficient learning of hidden state lti state space models of unknown order.arXiv preprint arXiv:2202.01625, 2022
-
[15]
Ky Fan. Maximum properties and inequalities for the eigenvalues of completely continuous operators.Proceedings of the National Academy of Sciences, 37(11):760–766, 1951
work page 1951
-
[16]
Maryam Fazel, Ting Kei Pong, Defeng Sun, and Paul Tseng. Hankel matrix rank minimization with applications to system identification and realization.SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013
work page 2013
-
[17]
Matan Gavish and David L Donoho. The optimal hard threshold for singular values is 4/ √ 3.IEEE Transactions on Information Theory, 60(8):5040–5053, 2014
work page 2014
-
[18]
Fano’s inequality for random variables
Sebastien Gerchinovitz, Pierre Ménard, and Gilles Stoltz. Fano’s inequality for random variables. 2020
work page 2020
-
[19]
Oliver Henkel. Sphere-packing bounds in the grassmann and stiefel manifolds.IEEE Transactions on Information Theory, 51(10):3445–3456, 2005
work page 2005
-
[20]
Alan Julian Izenman. Reduced-rank regression for the multivariate linear model.Journal of Multivariate Analysis, 5(2):248–264, 1975
work page 1975
-
[21]
Sample complexity lower bounds for linear system identification
Yassir Jedra and Alexandre Proutiere. Sample complexity lower bounds for linear system identification. In2019 IEEE 58th Conference on Decision and Control (CDC), page 2676–2681. IEEE Press, 2019
work page 2019
-
[22]
Yassir Jedra and Alexandre Proutiere. Finite-time identification of linear systems: Fundamental limits and optimal algorithms.IEEE Transactions on Automatic Control, 68(5):2805–2820, 2022. 10
work page 2022
- [23]
-
[24]
High-probability minimax lower bounds.arXiv preprint arXiv:2406.13447, 2024
Tianyi Ma, Kabir A Verchand, and Richard J Samworth. High-probability minimax lower bounds.arXiv preprint arXiv:2406.13447, 2024
-
[25]
Inequalities: theory of majorization and its applications
Albert W Marshall, Ingram Olkin, and Barry C Arnold. Inequalities: theory of majorization and its applications. 1979
work page 1979
-
[26]
Simon Mataigne, P-A Absil, and Nina Miolane. Bounds on the geodesic distances on the stiefel manifold for a family of riemannian metrics.arXiv preprint arXiv:2408.07072, 2024
-
[27]
a matrix inequality associated with bounds on solutions of algebraic riccati and lyapunov equation
Takehiro Mori. Comments on" a matrix inequality associated with bounds on solutions of algebraic riccati and lyapunov equation" by jm saniuk and ib rhodes.IEEE transactions on automatic control, 33(11):1088, 1988
work page 1988
-
[28]
Sahand Negahban and Martin J. Wainwright. Estimation of (near) low-rank matrices with noise and high- dimensional scaling.The Annals of Statistics, 39(2):1069 – 1097, 2011
work page 2011
-
[29]
Non-asymptotic identification of lti systems from a single trajectory
Samet Oymak and Necmiye Ozay. Non-asymptotic identification of lti systems from a single trajectory. In2019 American control conference (ACC), pages 5655–5661. IEEE, 2019
work page 2019
-
[30]
Lecture notes in statistics Multivariate reduced-rank regression
Gregory C Reinsel and Rajabather Palani Velu.Multivariate reduced-rank regression : theory and applications. Lecture notes in statistics Multivariate reduced-rank regression. Springer New York, New York, NY , 1998
work page 1998
- [31]
-
[32]
Finite time lti system identification.Journal of Machine Learning Research, 22(26):1–61, 2021
Tuhin Sarkar, Alexander Rakhlin, and Munther A Dahleh. Finite time lti system identification.Journal of Machine Learning Research, 22(26):1–61, 2021
work page 2021
-
[33]
Learning linear dynamical systems with semi-parametric least squares
Max Simchowitz, Ross Boczar, and Benjamin Recht. Learning linear dynamical systems with semi-parametric least squares. InConference on Learning Theory, pages 2714–2802. PMLR, 2019
work page 2019
-
[34]
Learning without mixing: Towards a sharp analysis of linear system identification
Max Simchowitz, Horia Mania, Stephen Tu, Michael I Jordan, and Benjamin Recht. Learning without mixing: Towards a sharp analysis of linear system identification. InConference On Learning Theory, pages 439–473. PMLR, 2018
work page 2018
-
[35]
Shuai Sun, Jiayun Li, and Yilin Mo. Finite time performance analysis of mimo systems identification.arXiv preprint arXiv:2310.11790, 2023
-
[36]
Yue Sun, Samet Oymak, and Maryam Fazel. Finite sample identification of low-order lti systems via nuclear norm regularization.IEEE Open Journal of Control Systems, 1:237–254, 2022
work page 2022
-
[37]
Cambridge university press, 2019
Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019
work page 2019
-
[38]
Adaptive reduced rank regression
Qiong Wu, Felix M Wong, Yanhua Li, Zhenming Liu, and Varun Kanade. Adaptive reduced rank regression. Advances in Neural Information Processing Systems, 33:4103–4114, 2020
work page 2020
-
[39]
Optimal exact least squares rank minimization
Shuo Xiang, Yunzhang Zhu, Xiaotong Shen, and Jieping Ye. Optimal exact least squares rank minimization. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 480–488, 2012
work page 2012
-
[40]
Ming Yuan, Ali Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coefficient estimation in multivariate linear regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346, 2007
work page 2007
-
[41]
Yuyang Zhang, Shahriar Talebi, and Na Li. Learning low-dimensional latent dynamics from high-dimensional observations: Non-asymptotics and lower bounds.arXiv preprint arXiv:2405.06089, 2024
-
[42]
A tutorial on the non-asymptotic theory of system identification
Ingvar Ziemann, Anastasios Tsiamis, Bruce Lee, Yassir Jedra, Nikolai Matni, and George J Pappas. A tutorial on the non-asymptotic theory of system identification. In2023 62nd IEEE Conference on Decision and Control (CDC), pages 8921–8939. IEEE, 2023. 11 Contents 1 Introduction 1 2 Preliminaries 2 3 Related Work 3 4 Instance-specific Sample Complexity Lowe...
work page 2023
-
[43]
Lemma A.4.There exists a universal constantc≈0.17such thatb≥e −1−log(2) 3 − 1 2 −c q k 2
The next lemma, proved below, gives a lower bound onb. Lemma A.4.There exists a universal constantc≈0.17such thatb≥e −1−log(2) 3 − 1 2 −c q k 2 . From this lemma, we deduce that the distance between two elements of the packing satisfies: d0 =ab≥ e −1−log(2) 3 − 1 2 −c 2 √ 2 √ k. Hence we have proved that there exists a packing of size2k(2d−k) ≥2 kd with m...
-
[44]
Lower bound that depends on dy.We start by applying Lemmas A.1 and A.2 to A using the confusing models fromC 1(A, r, ε). More precisely, introduce for alli∈[2 rdy], Ai =A+ 2Cε√r QiW ⊤ −r,whereQ i ∈ P dy r , andC≥1is a universal constant previously defined in Lemma 4.1. Application of Lemma A.1: We have ∀i∈[2 rdy],E Ai[LN(Ai, A)] = 1 2σ2 Tr (A−A i)⊤(A−A i)...
-
[45]
Application of Lemma A.1: We use the confusing models fromC 2(A, r, ε)
Lower bound that depends on dx.This case is relevant only for the multivariate regression since in this case, dx might differ fromd y. Application of Lemma A.1: We use the confusing models fromC 2(A, r, ε). Introduce for alli∈[2 rdy], Ai =A+ 2Cε√r UrR⊤ i ,whereR i ∈ P dx r . Then EAi[LN(Ai, A)] = N 2σ2 Tr((A−A i)⊤(A−A i)Σ) = 4N C2ε2 2σ2r Tr(UrR⊤ i ΣRiU ⊤ ...
-
[46]
Lower bounds that depends ond y: Consider the confusing models fromC 1(A, k⋆ A,N , ε): ∀i∈[2 k⋆ A,N dy], A i =A+ 2Cε ′ q k⋆ A,N QiW ⊤ −k⋆ A,N ,whereQ i ∈ P dy k⋆ A,N . We apply Lemmas A.1 and A.2 as in the proof of Theorem 4.3 (replace r by k⋆ A,N , and ε by ε′), and derive following lower bound: ε′2 ≥ σ2 log(2) 8 log( 1 δ ) +k ⋆ A,N dy N λk⋆ A,N (Σ) . 20...
-
[47]
Lower bound that depends ond x: The proof follows along the same lines as those in the proof of Theorem 4.3. A.5 Extremal partial trace Lemma A.5.For anyk≤dand two matricesQ∈St d k(R),Σ∈R d×d symmetric positive definite, we have dX i=d−k+1 λi(Σ)≤Tr(Q ⊤ΣQ)≤ kX i=1 λi(Σ).(19) Proof. The proof of this classical result, sometimes referred to asextremal partia...
-
[48]
The true error of T-LSE and our upper bound are close
As a function of n (d= 50 , r= 10 ). The true error of T-LSE and our upper bound are close. Both exhibit a hyperbolic decrease towards zero as can be seen in Figure 5. Figure 5: Performance ofT-LSEand its corresponding error upper bound as a function ofn
-
[49]
As a function of d (n= 5000 , r= 10 ). We observe in Figure 6 that the T-LSE error is linear in d as predicted by the upper bound (and lower bound). For the upper bound, the observed deviation from perfect linearity arises from the degradation of the accuracy of the empirical eigenvalueλ H r (ˆΣ)asnis kept fixed. Figure 6: Performance ofT-LSEand its corre...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.