Joint Identification of Linear Dynamics and Noise Covariance via Distributional Estimation

Na Li; Yang Hu

arxiv: 2604.14130 · v1 · submitted 2026-04-15 · 📡 eess.SY · cs.SY· math.DS

Joint Identification of Linear Dynamics and Noise Covariance via Distributional Estimation

Yang Hu , Na Li This is my paper

Pith reviewed 2026-05-10 12:26 UTC · model grok-4.3

classification 📡 eess.SY cs.SYmath.DS

keywords linear systemssystem identificationnoise covariance estimationdistributional estimationmaximum likelihoodscore matchingstate transitions

0 comments

The pith

A parameterization of state-transition distributions enables joint estimation of linear dynamics matrix A and noise covariance Σ for non-Gaussian noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a framework that estimates both the dynamical matrix A and the noise covariance Σ at the same time from sequences of state transitions. It does so by introducing a parameterization that lets the estimator draw on the full shape of the observed transition distribution rather than only its first two moments. Two concrete estimators are derived: a maximum-likelihood version and a score-matching version, each supplied with consistency results and sample-complexity bounds. When the parameterization matches the true noise law, the approach recovers both quantities more accurately than ordinary least-squares regression on the same data.

Core claim

By reparameterizing the conditional distribution of the next state given the current state, the joint problem of recovering A and Σ becomes tractable for arbitrary noise distributions; the resulting maximum-likelihood and score-matching estimators are consistent and achieve improved finite-sample accuracy whenever the chosen family is rich enough to represent the true noise law.

What carries the argument

The novel parameterization of the state-transition distribution, which encodes the full distributional shape of the driving noise and thereby supplies extra information for separating A from Σ.

If this is right

The estimators remain consistent even when the noise is non-Gaussian, provided the parameterization is correctly specified.
Sample-complexity bounds quantify how many transitions are needed to reach a given accuracy level for both A and Σ.
The same data set yields simultaneous estimates of dynamics and noise covariance, removing the need for separate experiments.
Simulation evidence indicates lower error than ordinary least squares under matched distributional assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same parameterization idea could be tested on partially observed systems by combining it with filtering techniques.
If the method scales to moderate-dimensional state spaces, it may reduce the data requirements for covariance-aware controller design.
One could replace the current estimators with online or recursive versions to track slowly time-varying A or Σ.

Load-bearing premise

The chosen family of distributions for the state transitions must be able to represent the actual noise law; otherwise the extra shape information is unavailable and the claimed accuracy gains disappear.

What would settle it

Run the two new estimators on synthetic data generated from a noise distribution outside the assumed family and compare their estimation error for A and Σ against ordinary least squares; if the new estimators show no systematic improvement, the central claim is false.

Figures

Figures reproduced from arXiv: 2604.14130 by Na Li, Yang Hu.

**Figure 2.** Figure 2: Identification error of MLE and OLS estimators for di [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Identification error of different implementations of MLE for different sample sizes (2-d Student-t family, log scale, shadowed confidence interval). (a) 2-d system, different sample sizes (b) 256 samples, different system dimensions [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Average computation time of MLE, SME and OLS estimators for di [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Identification error of MLE estimators for mis-specified base densities at di [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

In this paper, we propose a novel framework for the joint identification of system dynamics and noise covariance in linear systems, under general noise distributions beyond Gaussian. Specifically, we would like to simultaneously estimate the dynamical matrix $A$ and the noise covariance matrix $\varSigma$ using state transition data. The formulation builds upon a novel parameterization of the state-transition distribution, which enables more effective use of distributional "shape" information for improved identification accuracy. We introduce two practical estimators, namely the maximum likelihood estimator (MLE) and the score-matching estimator (SME), to solve the joint dynamics-covariance identification problem, and provide rigorous analysis of their statistical properties and sample complexity. Simulation results show that the proposed estimators outperform the ordinary least squares (OLS) baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a distributional parameterization to jointly recover linear dynamics A and noise covariance from transitions, with MLE and score-matching estimators plus sample-complexity claims, but the parameterization's range for general non-Gaussian noise is underspecified.

read the letter

The main thing to know is that this work introduces a parameterization of the state-transition distribution meant to capture both the system matrix and the full noise shape, then uses that to build maximum-likelihood and score-matching estimators for the joint problem. The abstract positions this as an improvement over ordinary least squares when noise is non-Gaussian, and it supplies statistical analysis plus simulation comparisons. That framing addresses a real, recurring need in system identification where people often want covariance estimates without forcing a Gaussian assumption. The two estimators and the sample-complexity results are the concrete new pieces. If the derivations hold, they give practitioners something more structured than ad-hoc moment matching. The simulations reportedly show gains, which is at least a useful sanity check against the baseline. The soft spot is the parameterization itself. The abstract does not state the explicit family or the conditions under which it can represent arbitrary noise distributions while staying tractable. If the family is narrower than claimed, the accuracy improvements and consistency guarantees shrink accordingly, and the advantage over OLS becomes case-by-case rather than general. Without seeing the exact form and the proof assumptions, it is difficult to judge how much of the distributional shape is actually being used versus implicitly restricted. This is aimed at researchers in control and signal processing who work on linear system identification with unknown or non-Gaussian disturbances. Someone already thinking about robust or distribution-aware methods would get the most out of the estimators and bounds. The paper has enough structure and stated results to deserve a serious referee, though the review should focus on verifying the parameterization's expressiveness and the tightness of the sample-complexity statements under realistic noise. I would send it to peer review with that emphasis.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a novel framework for jointly identifying the linear dynamics matrix A and the noise covariance Σ in linear systems from state transition data, under general (non-Gaussian) noise distributions. It introduces a novel parameterization of the conditional state-transition distribution to leverage distributional shape information, and develops two estimators: the maximum likelihood estimator (MLE) and the score-matching estimator (SME). The paper provides statistical analysis of their properties and sample complexity, and reports simulation results showing outperformance over ordinary least squares (OLS).

Significance. If the central claims hold, particularly the expressiveness of the parameterization and the consistency of the estimators, this could represent a meaningful advance in system identification by moving beyond moment-based methods like OLS to utilize full distributional information. The rigorous analysis and simulation comparisons are positive features that would support the contribution if the parameterization's generality is established.

major comments (3)

[§2] §2 (novel parameterization): The parameterization of the state-transition distribution is introduced as the key enabler for using distributional shape information beyond second moments, but no explicit characterization of the function family (e.g., location-scale, exponential family, or mixture class) or density argument establishing expressiveness for arbitrary noise distributions is provided. This directly undermines the claims of applicability to 'general noise distributions beyond Gaussian' and the superiority of MLE/SME.
[Theorem 3.2] Theorem 3.2 (consistency of MLE): The proof assumes the true conditional distribution lies in the parameterized family; no misspecification analysis or robustness result is given, yet the abstract claims the method works for general noises. If the family is misspecified, the estimator need not recover the true A and Σ, which is load-bearing for the statistical guarantees.
[§5, Table 1] §5 (simulations, Table 1): The reported RMSE improvements over OLS are shown only for selected non-Gaussian noises; without confirming that these noises lie outside the parameterization or providing quantitative coverage metrics, the results do not verify the general-case claim.

minor comments (2)

[§2.1] Notation in §2.1: The parameterized density p_θ(x_{t+1}|x_t) is not consistently subscripted with the joint parameters (A,Σ) in all subsequent estimator derivations, which obscures how the shape information enters the objective.
[§1.2] Related work: Standard references on score-matching estimation for dynamical systems and on robust system identification under non-Gaussian noise are absent.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on the parameterization, consistency results, and empirical validation. The comments highlight areas where additional rigor and clarification will strengthen the manuscript. We address each major comment below and will incorporate revisions to provide the requested characterizations, assumptions, and metrics.

read point-by-point responses

Referee: §2 (novel parameterization): The parameterization of the state-transition distribution is introduced as the key enabler for using distributional shape information beyond second moments, but no explicit characterization of the function family (e.g., location-scale, exponential family, or mixture class) or density argument establishing expressiveness for arbitrary noise distributions is provided. This directly undermines the claims of applicability to 'general noise distributions beyond Gaussian' and the superiority of MLE/SME.

Authors: We agree that an explicit characterization of the parameterized family is required to support the claims. In Section 2 the parameterization is defined as a conditional location-scale model with location A x, scale Σ, and flexible shape parameters on the base density. In the revision we will add a formal subsection characterizing the family (as a location-scale family with base density from a rich parametric class such as finite mixtures or kernel-based densities) together with a density argument establishing that the family is dense in the space of continuous conditional distributions under standard regularity conditions. This will clarify that 'general' refers to distributions beyond the Gaussian subfamily while remaining within a parametric class, and we will update the abstract and introduction accordingly. revision: yes
Referee: Theorem 3.2 (consistency of MLE): The proof assumes the true conditional distribution lies in the parameterized family; no misspecification analysis or robustness result is given, yet the abstract claims the method works for general noises. If the family is misspecified, the estimator need not recover the true A and Σ, which is load-bearing for the statistical guarantees.

Authors: Theorem 3.2 establishes consistency under the standard assumption of correct specification (true distribution lies in the family). We will revise the theorem statement and surrounding text to make this assumption explicit. We will also add a new remark discussing misspecification: because the parameterization separates the conditional mean and covariance (governed by A and Σ) from higher-order shape parameters, the MLE for A and Σ converges to the true values whenever the first two moments are correctly captured, even under shape misspecification. This provides a robustness guarantee for the quantities of primary interest; the added remark will include a brief sketch of the argument. revision: yes
Referee: §5 (simulations, Table 1): The reported RMSE improvements over OLS are shown only for selected non-Gaussian noises; without confirming that these noises lie outside the parameterization or providing quantitative coverage metrics, the results do not verify the general-case claim.

Authors: The simulation noises (Student's t with low degrees of freedom and Gaussian mixtures) were chosen precisely because they exhibit non-Gaussian features. In the revision we will add a paragraph in Section 5 that (i) verifies these distributions lie outside the Gaussian subfamily of the parameterization and (ii) reports quantitative coverage metrics, specifically the minimal KL divergence between each true noise distribution and its best fit within the parameterized family. We will also expand the simulation suite with one additional noise type and update Table 1 to include these metrics, thereby directly addressing the general-case claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces a novel parameterization of the state-transition distribution along with MLE and SME estimators, then derives statistical properties and sample-complexity bounds for them. No quoted equations or claims reduce these new objects to quantities already fitted from the same data by construction, nor do they rely on self-citation chains that collapse the central claim. The framework is presented as self-contained with independent analysis of the estimators' behavior under the stated parameterization.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a tractable yet expressive parameterization of the state-transition distribution that encodes shape information beyond first and second moments; the linear-system-plus-additive-noise model is taken as given.

axioms (1)

domain assumption The underlying system is linear with additive noise whose distribution belongs to a parameterized family that can be estimated from transitions.
Stated in the problem formulation for linear systems under general noise distributions.

invented entities (1)

novel parameterization of the state-transition distribution no independent evidence
purpose: To encode distributional shape information for joint estimation of A and Σ
Introduced as the key technical device enabling the MLE and SME estimators.

pith-pipeline@v0.9.0 · 5422 in / 1434 out tokens · 75102 ms · 2026-05-10T12:26:17.614639+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Ljung, Consistency of the least-squares identification method, IEEE Transac- tions on Automatic Control 21 (1976) 779–781

L. Ljung, Consistency of the least-squares identification method, IEEE Transac- tions on Automatic Control 21 (1976) 779–781

work page 1976
[2]

Ljung, On the consistency of prediction error identification methods, in: Mathematics in Science and Engineering, volume 126, Elsevier, 1976, pp

L. Ljung, On the consistency of prediction error identification methods, in: Mathematics in Science and Engineering, volume 126, Elsevier, 1976, pp. 121– 164

work page 1976
[3]

Simchowitz, H

M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, B. Recht, Learning without mixing: Towards a sharp analysis of linear system identification, in: Conference on Learning Theory, PMLR, 2018, pp. 439–473

work page 2018
[4]

Sarkar, A

T. Sarkar, A. Rakhlin, Near optimal finite time identification of arbitrary linear dynamical systems, in: International Conference on Machine Learning, PMLR, 2019, pp. 5610–5618

work page 2019
[5]

Oymak, N

S. Oymak, N. Ozay, Non-asymptotic identification of LTI systems from a single trajectory, in: 2019 American Control Conference (ACC), IEEE, 2019, pp. 5655– 5661. 17

work page 2019
[6]

Jedra, A

Y . Jedra, A. Proutiere, Finite-time identification of linear systems: Fundamental limits and optimal algorithms, IEEE Transactions on Automatic Control 68 (2022) 2805–2820

work page 2022
[7]

Cohen, T

A. Cohen, T. Koren, Y . Mansour, Learning linear-quadratic regulators efficiently with only √ T regret, in: International Conference on Machine Learning, PMLR, 2019, pp. 1300–1309

work page 2019
[8]

Mania, S

H. Mania, S. Tu, B. Recht, Certainty equivalence is efficient for linear quadratic control, Advances in Neural Information Processing Systems 32 (2019)

work page 2019
[9]

S. Dean, H. Mania, N. Matni, B. Recht, S. Tu, On the sample complexity of the linear quadratic regulator, Foundations of Computational Mathematics 20 (2020) 633–679

work page 2020
[10]

Simchowitz, D

M. Simchowitz, D. Foster, Naive exploration is optimal for online LQR, in: International Conference on Machine Learning, PMLR, 2020, pp. 8937–8948

work page 2020
[11]

Zymler, D

S. Zymler, D. Kuhn, B. Rustem, Distributionally robust joint chance constraints with second-order moment information, Mathematical Programming 137 (2013) 167–198

work page 2013
[12]

B. P. Van Parys, D. Kuhn, P. J. Goulart, M. Morari, Distributionally robust control of constrained stochastic systems, IEEE Transactions on Automatic Control 61 (2015) 430–442

work page 2015
[13]

Kishida, A

M. Kishida, A. Cetinkaya, Risk-aware linear quadratic control using conditional value-at-risk, IEEE Transactions on Automatic Control 68 (2023) 416–423

work page 2023
[14]

Y . Hu, S. Talebi, N. Li, Risk-sensitive affine control synthesis for stationary LTI systems, arXiv preprint arXiv:2410.17581 (2024)

work page arXiv 2024
[15]

Huang, Y

Y . Huang, Y . Zhang, Z. Wu, N. Li, J. Chambers, A novel adaptive Kalman filter with inaccurate process and measurement noise covariance matrices, IEEE Transactions on Automatic Control 63 (2017) 594–601

work page 2017
[16]

J. O. d. A. Limaverde Filho, E. L. Fortaleza, J. Silva, M. de Campos, Adap- tive Kalman filtering for closed-loop systems based on the observation vector covariance, International Journal of Control 95 (2022) 1731–1746

work page 2022
[17]

Mehra, On the identification of variances and adaptive Kalman filtering, IEEE Transactions on Automatic Control 15 (1970) 175–184

R. Mehra, On the identification of variances and adaptive Kalman filtering, IEEE Transactions on Automatic Control 15 (1970) 175–184

work page 1970
[18]

B. J. Odelson, M. R. Rajamani, J. B. Rawlings, A new autocovariance least-squares method for estimating noise covariances, Automatica 42 (2006) 303–308

work page 2006
[19]

M. Ge, E. C. Kerrigan, Noise covariance identification for time-varying and nonlinear systems, International Journal of Control 90 (2017) 1903–1915. 18

work page 2017
[20]

V . A. Bavdekar, A. P. Deshpande, S. C. Patwardhan, Identification of process and measurement noise covariance for state and parameter estimation using extended Kalman filter, Journal of Process Control 21 (2011) 585–601

work page 2011
[21]

M. A. Zagrobelny, J. B. Rawlings, Identifying the uncertainty structure using maximum likelihood estimation, in: 2015 American Control Conference (ACC), IEEE, 2015, pp. 422–427

work page 2015
[22]

Shao, Mathematical statistics, Springer Science & Business Media, 2008

J. Shao, Mathematical statistics, Springer Science & Business Media, 2008

work page 2008
[23]

Cambanis, S

S. Cambanis, S. Huang, G. Simons, On the theory of elliptically contoured distributions, Journal of Multivariate Analysis 11 (1981) 368–385

work page 1981
[24]

Teicher, Maximum likelihood characterization of distributions, The Annals of Mathematical Statistics 32 (1961) 1214–1222

H. Teicher, Maximum likelihood characterization of distributions, The Annals of Mathematical Statistics 32 (1961) 1214–1222

work page 1961
[25]

A. W. Marshall, I. Olkin, Maximum likelihood characterizations of distributions, Statistica Sinica (1993) 157–171

work page 1993
[26]

Wasserman, All of statistics: a concise course in statistical inference, Springer Science & Business Media, 2013

L. Wasserman, All of statistics: a concise course in statistical inference, Springer Science & Business Media, 2013

work page 2013
[27]

Hyv¨arinen, P

A. Hyv¨arinen, P. Dayan, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research 6 (2005)

work page 2005
[28]

Y . Song, S. Garg, J. Shi, S. Ermon, Sliced score matching: A scalable approach to density and score estimation, in: Uncertainty in Artificial intelligence, PMLR, 2020, pp. 574–584

work page 2020
[29]

Bilodeau, D

M. Bilodeau, D. Brenner, Theory of Multivariate Statistics, Springer Science & Business Media, 1999

work page 1999
[30]

Abbasi-Yadkori, D

Y . Abbasi-Yadkori, D. P´al, C. Szepesv´ari, Improved algorithms for linear stochastic bandits, Advances in Neural Information Processing Systems 24 (2011)

work page 2011
[31]

Virtanen, R

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al., SciPy 1.0: Fundamental algorithms for scientific computing in Python, Nature Methods 17 (2020) 261–272

work page 2020
[32]

Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2000

R. Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2000

work page 2000
[33]

K. B. Petersen, M. S. Pedersen, et al., The Matrix Cookbook, Technical University of Denmark 7 (2008) 510. Notations.Let Sn denote the set of all n-by-n symmetric matrices, where subscripts indicate positive-(semi)definiteness. Let A≻B (A≽B ) be a shorthand for denoting A−B to be positive (semi)definite. Let ∥·∥ denote the Euclidean 2-norm for vectors and...

work page 2008

[1] [1]

Ljung, Consistency of the least-squares identification method, IEEE Transac- tions on Automatic Control 21 (1976) 779–781

L. Ljung, Consistency of the least-squares identification method, IEEE Transac- tions on Automatic Control 21 (1976) 779–781

work page 1976

[2] [2]

Ljung, On the consistency of prediction error identification methods, in: Mathematics in Science and Engineering, volume 126, Elsevier, 1976, pp

L. Ljung, On the consistency of prediction error identification methods, in: Mathematics in Science and Engineering, volume 126, Elsevier, 1976, pp. 121– 164

work page 1976

[3] [3]

Simchowitz, H

M. Simchowitz, H. Mania, S. Tu, M. I. Jordan, B. Recht, Learning without mixing: Towards a sharp analysis of linear system identification, in: Conference on Learning Theory, PMLR, 2018, pp. 439–473

work page 2018

[4] [4]

Sarkar, A

T. Sarkar, A. Rakhlin, Near optimal finite time identification of arbitrary linear dynamical systems, in: International Conference on Machine Learning, PMLR, 2019, pp. 5610–5618

work page 2019

[5] [5]

Oymak, N

S. Oymak, N. Ozay, Non-asymptotic identification of LTI systems from a single trajectory, in: 2019 American Control Conference (ACC), IEEE, 2019, pp. 5655– 5661. 17

work page 2019

[6] [6]

Jedra, A

Y . Jedra, A. Proutiere, Finite-time identification of linear systems: Fundamental limits and optimal algorithms, IEEE Transactions on Automatic Control 68 (2022) 2805–2820

work page 2022

[7] [7]

Cohen, T

A. Cohen, T. Koren, Y . Mansour, Learning linear-quadratic regulators efficiently with only √ T regret, in: International Conference on Machine Learning, PMLR, 2019, pp. 1300–1309

work page 2019

[8] [8]

Mania, S

H. Mania, S. Tu, B. Recht, Certainty equivalence is efficient for linear quadratic control, Advances in Neural Information Processing Systems 32 (2019)

work page 2019

[9] [9]

S. Dean, H. Mania, N. Matni, B. Recht, S. Tu, On the sample complexity of the linear quadratic regulator, Foundations of Computational Mathematics 20 (2020) 633–679

work page 2020

[10] [10]

Simchowitz, D

M. Simchowitz, D. Foster, Naive exploration is optimal for online LQR, in: International Conference on Machine Learning, PMLR, 2020, pp. 8937–8948

work page 2020

[11] [11]

Zymler, D

S. Zymler, D. Kuhn, B. Rustem, Distributionally robust joint chance constraints with second-order moment information, Mathematical Programming 137 (2013) 167–198

work page 2013

[12] [12]

B. P. Van Parys, D. Kuhn, P. J. Goulart, M. Morari, Distributionally robust control of constrained stochastic systems, IEEE Transactions on Automatic Control 61 (2015) 430–442

work page 2015

[13] [13]

Kishida, A

M. Kishida, A. Cetinkaya, Risk-aware linear quadratic control using conditional value-at-risk, IEEE Transactions on Automatic Control 68 (2023) 416–423

work page 2023

[14] [14]

Y . Hu, S. Talebi, N. Li, Risk-sensitive affine control synthesis for stationary LTI systems, arXiv preprint arXiv:2410.17581 (2024)

work page arXiv 2024

[15] [15]

Huang, Y

Y . Huang, Y . Zhang, Z. Wu, N. Li, J. Chambers, A novel adaptive Kalman filter with inaccurate process and measurement noise covariance matrices, IEEE Transactions on Automatic Control 63 (2017) 594–601

work page 2017

[16] [16]

J. O. d. A. Limaverde Filho, E. L. Fortaleza, J. Silva, M. de Campos, Adap- tive Kalman filtering for closed-loop systems based on the observation vector covariance, International Journal of Control 95 (2022) 1731–1746

work page 2022

[17] [17]

Mehra, On the identification of variances and adaptive Kalman filtering, IEEE Transactions on Automatic Control 15 (1970) 175–184

R. Mehra, On the identification of variances and adaptive Kalman filtering, IEEE Transactions on Automatic Control 15 (1970) 175–184

work page 1970

[18] [18]

B. J. Odelson, M. R. Rajamani, J. B. Rawlings, A new autocovariance least-squares method for estimating noise covariances, Automatica 42 (2006) 303–308

work page 2006

[19] [19]

M. Ge, E. C. Kerrigan, Noise covariance identification for time-varying and nonlinear systems, International Journal of Control 90 (2017) 1903–1915. 18

work page 2017

[20] [20]

V . A. Bavdekar, A. P. Deshpande, S. C. Patwardhan, Identification of process and measurement noise covariance for state and parameter estimation using extended Kalman filter, Journal of Process Control 21 (2011) 585–601

work page 2011

[21] [21]

M. A. Zagrobelny, J. B. Rawlings, Identifying the uncertainty structure using maximum likelihood estimation, in: 2015 American Control Conference (ACC), IEEE, 2015, pp. 422–427

work page 2015

[22] [22]

Shao, Mathematical statistics, Springer Science & Business Media, 2008

J. Shao, Mathematical statistics, Springer Science & Business Media, 2008

work page 2008

[23] [23]

Cambanis, S

S. Cambanis, S. Huang, G. Simons, On the theory of elliptically contoured distributions, Journal of Multivariate Analysis 11 (1981) 368–385

work page 1981

[24] [24]

Teicher, Maximum likelihood characterization of distributions, The Annals of Mathematical Statistics 32 (1961) 1214–1222

H. Teicher, Maximum likelihood characterization of distributions, The Annals of Mathematical Statistics 32 (1961) 1214–1222

work page 1961

[25] [25]

A. W. Marshall, I. Olkin, Maximum likelihood characterizations of distributions, Statistica Sinica (1993) 157–171

work page 1993

[26] [26]

Wasserman, All of statistics: a concise course in statistical inference, Springer Science & Business Media, 2013

L. Wasserman, All of statistics: a concise course in statistical inference, Springer Science & Business Media, 2013

work page 2013

[27] [27]

Hyv¨arinen, P

A. Hyv¨arinen, P. Dayan, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research 6 (2005)

work page 2005

[28] [28]

Y . Song, S. Garg, J. Shi, S. Ermon, Sliced score matching: A scalable approach to density and score estimation, in: Uncertainty in Artificial intelligence, PMLR, 2020, pp. 574–584

work page 2020

[29] [29]

Bilodeau, D

M. Bilodeau, D. Brenner, Theory of Multivariate Statistics, Springer Science & Business Media, 1999

work page 1999

[30] [30]

Abbasi-Yadkori, D

Y . Abbasi-Yadkori, D. P´al, C. Szepesv´ari, Improved algorithms for linear stochastic bandits, Advances in Neural Information Processing Systems 24 (2011)

work page 2011

[31] [31]

Virtanen, R

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al., SciPy 1.0: Fundamental algorithms for scientific computing in Python, Nature Methods 17 (2020) 261–272

work page 2020

[32] [32]

Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2000

R. Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2000

work page 2000

[33] [33]

K. B. Petersen, M. S. Pedersen, et al., The Matrix Cookbook, Technical University of Denmark 7 (2008) 510. Notations.Let Sn denote the set of all n-by-n symmetric matrices, where subscripts indicate positive-(semi)definiteness. Let A≻B (A≽B ) be a shorthand for denoting A−B to be positive (semi)definite. Let ∥·∥ denote the Euclidean 2-norm for vectors and...

work page 2008