Joint Identification of Linear Dynamics and Noise Covariance via Distributional Estimation
Pith reviewed 2026-05-10 12:26 UTC · model grok-4.3
The pith
A parameterization of state-transition distributions enables joint estimation of linear dynamics matrix A and noise covariance Σ for non-Gaussian noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By reparameterizing the conditional distribution of the next state given the current state, the joint problem of recovering A and Σ becomes tractable for arbitrary noise distributions; the resulting maximum-likelihood and score-matching estimators are consistent and achieve improved finite-sample accuracy whenever the chosen family is rich enough to represent the true noise law.
What carries the argument
The novel parameterization of the state-transition distribution, which encodes the full distributional shape of the driving noise and thereby supplies extra information for separating A from Σ.
If this is right
- The estimators remain consistent even when the noise is non-Gaussian, provided the parameterization is correctly specified.
- Sample-complexity bounds quantify how many transitions are needed to reach a given accuracy level for both A and Σ.
- The same data set yields simultaneous estimates of dynamics and noise covariance, removing the need for separate experiments.
- Simulation evidence indicates lower error than ordinary least squares under matched distributional assumptions.
Where Pith is reading between the lines
- The same parameterization idea could be tested on partially observed systems by combining it with filtering techniques.
- If the method scales to moderate-dimensional state spaces, it may reduce the data requirements for covariance-aware controller design.
- One could replace the current estimators with online or recursive versions to track slowly time-varying A or Σ.
Load-bearing premise
The chosen family of distributions for the state transitions must be able to represent the actual noise law; otherwise the extra shape information is unavailable and the claimed accuracy gains disappear.
What would settle it
Run the two new estimators on synthetic data generated from a noise distribution outside the assumed family and compare their estimation error for A and Σ against ordinary least squares; if the new estimators show no systematic improvement, the central claim is false.
Figures
read the original abstract
In this paper, we propose a novel framework for the joint identification of system dynamics and noise covariance in linear systems, under general noise distributions beyond Gaussian. Specifically, we would like to simultaneously estimate the dynamical matrix $A$ and the noise covariance matrix $\varSigma$ using state transition data. The formulation builds upon a novel parameterization of the state-transition distribution, which enables more effective use of distributional "shape" information for improved identification accuracy. We introduce two practical estimators, namely the maximum likelihood estimator (MLE) and the score-matching estimator (SME), to solve the joint dynamics-covariance identification problem, and provide rigorous analysis of their statistical properties and sample complexity. Simulation results show that the proposed estimators outperform the ordinary least squares (OLS) baseline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a novel framework for jointly identifying the linear dynamics matrix A and the noise covariance Σ in linear systems from state transition data, under general (non-Gaussian) noise distributions. It introduces a novel parameterization of the conditional state-transition distribution to leverage distributional shape information, and develops two estimators: the maximum likelihood estimator (MLE) and the score-matching estimator (SME). The paper provides statistical analysis of their properties and sample complexity, and reports simulation results showing outperformance over ordinary least squares (OLS).
Significance. If the central claims hold, particularly the expressiveness of the parameterization and the consistency of the estimators, this could represent a meaningful advance in system identification by moving beyond moment-based methods like OLS to utilize full distributional information. The rigorous analysis and simulation comparisons are positive features that would support the contribution if the parameterization's generality is established.
major comments (3)
- [§2] §2 (novel parameterization): The parameterization of the state-transition distribution is introduced as the key enabler for using distributional shape information beyond second moments, but no explicit characterization of the function family (e.g., location-scale, exponential family, or mixture class) or density argument establishing expressiveness for arbitrary noise distributions is provided. This directly undermines the claims of applicability to 'general noise distributions beyond Gaussian' and the superiority of MLE/SME.
- [Theorem 3.2] Theorem 3.2 (consistency of MLE): The proof assumes the true conditional distribution lies in the parameterized family; no misspecification analysis or robustness result is given, yet the abstract claims the method works for general noises. If the family is misspecified, the estimator need not recover the true A and Σ, which is load-bearing for the statistical guarantees.
- [§5, Table 1] §5 (simulations, Table 1): The reported RMSE improvements over OLS are shown only for selected non-Gaussian noises; without confirming that these noises lie outside the parameterization or providing quantitative coverage metrics, the results do not verify the general-case claim.
minor comments (2)
- [§2.1] Notation in §2.1: The parameterized density p_θ(x_{t+1}|x_t) is not consistently subscripted with the joint parameters (A,Σ) in all subsequent estimator derivations, which obscures how the shape information enters the objective.
- [§1.2] Related work: Standard references on score-matching estimation for dynamical systems and on robust system identification under non-Gaussian noise are absent.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the parameterization, consistency results, and empirical validation. The comments highlight areas where additional rigor and clarification will strengthen the manuscript. We address each major comment below and will incorporate revisions to provide the requested characterizations, assumptions, and metrics.
read point-by-point responses
-
Referee: §2 (novel parameterization): The parameterization of the state-transition distribution is introduced as the key enabler for using distributional shape information beyond second moments, but no explicit characterization of the function family (e.g., location-scale, exponential family, or mixture class) or density argument establishing expressiveness for arbitrary noise distributions is provided. This directly undermines the claims of applicability to 'general noise distributions beyond Gaussian' and the superiority of MLE/SME.
Authors: We agree that an explicit characterization of the parameterized family is required to support the claims. In Section 2 the parameterization is defined as a conditional location-scale model with location A x, scale Σ, and flexible shape parameters on the base density. In the revision we will add a formal subsection characterizing the family (as a location-scale family with base density from a rich parametric class such as finite mixtures or kernel-based densities) together with a density argument establishing that the family is dense in the space of continuous conditional distributions under standard regularity conditions. This will clarify that 'general' refers to distributions beyond the Gaussian subfamily while remaining within a parametric class, and we will update the abstract and introduction accordingly. revision: yes
-
Referee: Theorem 3.2 (consistency of MLE): The proof assumes the true conditional distribution lies in the parameterized family; no misspecification analysis or robustness result is given, yet the abstract claims the method works for general noises. If the family is misspecified, the estimator need not recover the true A and Σ, which is load-bearing for the statistical guarantees.
Authors: Theorem 3.2 establishes consistency under the standard assumption of correct specification (true distribution lies in the family). We will revise the theorem statement and surrounding text to make this assumption explicit. We will also add a new remark discussing misspecification: because the parameterization separates the conditional mean and covariance (governed by A and Σ) from higher-order shape parameters, the MLE for A and Σ converges to the true values whenever the first two moments are correctly captured, even under shape misspecification. This provides a robustness guarantee for the quantities of primary interest; the added remark will include a brief sketch of the argument. revision: yes
-
Referee: §5 (simulations, Table 1): The reported RMSE improvements over OLS are shown only for selected non-Gaussian noises; without confirming that these noises lie outside the parameterization or providing quantitative coverage metrics, the results do not verify the general-case claim.
Authors: The simulation noises (Student's t with low degrees of freedom and Gaussian mixtures) were chosen precisely because they exhibit non-Gaussian features. In the revision we will add a paragraph in Section 5 that (i) verifies these distributions lie outside the Gaussian subfamily of the parameterization and (ii) reports quantitative coverage metrics, specifically the minimal KL divergence between each true noise distribution and its best fit within the parameterized family. We will also expand the simulation suite with one additional noise type and update Table 1 to include these metrics, thereby directly addressing the general-case claim. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces a novel parameterization of the state-transition distribution along with MLE and SME estimators, then derives statistical properties and sample-complexity bounds for them. No quoted equations or claims reduce these new objects to quantities already fitted from the same data by construction, nor do they rely on self-citation chains that collapse the central claim. The framework is presented as self-contained with independent analysis of the estimators' behavior under the stated parameterization.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The underlying system is linear with additive noise whose distribution belongs to a parameterized family that can be estimated from transitions.
invented entities (1)
-
novel parameterization of the state-transition distribution
no independent evidence
Reference graph
Works this paper leans on
-
[1]
L. Ljung, Consistency of the least-squares identification method, IEEE Transac- tions on Automatic Control 21 (1976) 779–781
work page 1976
-
[2]
L. Ljung, On the consistency of prediction error identification methods, in: Mathematics in Science and Engineering, volume 126, Elsevier, 1976, pp. 121– 164
work page 1976
-
[3]
M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, B. Recht, Learning without mixing: Towards a sharp analysis of linear system identification, in: Conference on Learning Theory, PMLR, 2018, pp. 439–473
work page 2018
- [4]
- [5]
- [6]
- [7]
- [8]
-
[9]
S. Dean, H. Mania, N. Matni, B. Recht, S. Tu, On the sample complexity of the linear quadratic regulator, Foundations of Computational Mathematics 20 (2020) 633–679
work page 2020
-
[10]
M. Simchowitz, D. Foster, Naive exploration is optimal for online LQR, in: International Conference on Machine Learning, PMLR, 2020, pp. 8937–8948
work page 2020
- [11]
-
[12]
B. P. Van Parys, D. Kuhn, P. J. Goulart, M. Morari, Distributionally robust control of constrained stochastic systems, IEEE Transactions on Automatic Control 61 (2015) 430–442
work page 2015
-
[13]
M. Kishida, A. Cetinkaya, Risk-aware linear quadratic control using conditional value-at-risk, IEEE Transactions on Automatic Control 68 (2023) 416–423
work page 2023
- [14]
- [15]
-
[16]
J. O. d. A. Limaverde Filho, E. L. Fortaleza, J. Silva, M. de Campos, Adap- tive Kalman filtering for closed-loop systems based on the observation vector covariance, International Journal of Control 95 (2022) 1731–1746
work page 2022
-
[17]
R. Mehra, On the identification of variances and adaptive Kalman filtering, IEEE Transactions on Automatic Control 15 (1970) 175–184
work page 1970
-
[18]
B. J. Odelson, M. R. Rajamani, J. B. Rawlings, A new autocovariance least-squares method for estimating noise covariances, Automatica 42 (2006) 303–308
work page 2006
-
[19]
M. Ge, E. C. Kerrigan, Noise covariance identification for time-varying and nonlinear systems, International Journal of Control 90 (2017) 1903–1915. 18
work page 2017
-
[20]
V . A. Bavdekar, A. P. Deshpande, S. C. Patwardhan, Identification of process and measurement noise covariance for state and parameter estimation using extended Kalman filter, Journal of Process Control 21 (2011) 585–601
work page 2011
-
[21]
M. A. Zagrobelny, J. B. Rawlings, Identifying the uncertainty structure using maximum likelihood estimation, in: 2015 American Control Conference (ACC), IEEE, 2015, pp. 422–427
work page 2015
-
[22]
Shao, Mathematical statistics, Springer Science & Business Media, 2008
J. Shao, Mathematical statistics, Springer Science & Business Media, 2008
work page 2008
-
[23]
S. Cambanis, S. Huang, G. Simons, On the theory of elliptically contoured distributions, Journal of Multivariate Analysis 11 (1981) 368–385
work page 1981
-
[24]
H. Teicher, Maximum likelihood characterization of distributions, The Annals of Mathematical Statistics 32 (1961) 1214–1222
work page 1961
-
[25]
A. W. Marshall, I. Olkin, Maximum likelihood characterizations of distributions, Statistica Sinica (1993) 157–171
work page 1993
-
[26]
L. Wasserman, All of statistics: a concise course in statistical inference, Springer Science & Business Media, 2013
work page 2013
-
[27]
A. Hyv¨arinen, P. Dayan, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research 6 (2005)
work page 2005
-
[28]
Y . Song, S. Garg, J. Shi, S. Ermon, Sliced score matching: A scalable approach to density and score estimation, in: Uncertainty in Artificial intelligence, PMLR, 2020, pp. 574–584
work page 2020
-
[29]
M. Bilodeau, D. Brenner, Theory of Multivariate Statistics, Springer Science & Business Media, 1999
work page 1999
-
[30]
Y . Abbasi-Yadkori, D. P´al, C. Szepesv´ari, Improved algorithms for linear stochastic bandits, Advances in Neural Information Processing Systems 24 (2011)
work page 2011
-
[31]
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al., SciPy 1.0: Fundamental algorithms for scientific computing in Python, Nature Methods 17 (2020) 261–272
work page 2020
-
[32]
Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2000
R. Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2000
work page 2000
-
[33]
K. B. Petersen, M. S. Pedersen, et al., The Matrix Cookbook, Technical University of Denmark 7 (2008) 510. Notations.Let Sn denote the set of all n-by-n symmetric matrices, where subscripts indicate positive-(semi)definiteness. Let A≻B (A≽B ) be a shorthand for denoting A−B to be positive (semi)definite. Let ∥·∥ denote the Euclidean 2-norm for vectors and...
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.