Inferring Latent dimension of Linear Dynamical System with Minimum Description Length

Yang Li

arxiv: 1906.09536 · v1 · pith:ESTDNTKQnew · submitted 2019-06-23 · 💻 cs.LG · stat.ML

Inferring Latent dimension of Linear Dynamical System with Minimum Description Length

Yang Li This is my paper

Pith reviewed 2026-05-25 17:58 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords linear dynamical systemsminimum description lengthlatent dimension inferencemodel selectiontime series modelingsystem identification

0 comments

The pith

A minimum description length criterion that accounts for latent structure selects the dimension of linear dynamical systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Linear dynamical systems model time series with a latent state whose dimension affects model power. Because lower-dimensional models are nested inside higher-dimensional ones, the likelihood never decreases when the dimension increases. Likelihood-based criteria therefore cannot select the dimension. The paper introduces a minimum description length criterion that adds an explicit term for the cost of describing the latent structure. Experiments on univariate and multivariate sequences show the criterion selects dimensions that support effective model training.

Core claim

The newly proposed MDL criterion for linear dynamical systems, which explicitly considers the latent structure, extends the minimum description length principle and demonstrates its effectiveness in the tasks of model training.

What carries the argument

The MDL criterion that includes the description length of the latent structure in addition to the model parameters and data.

If this is right

The criterion allows automatic selection of latent dimension during training of linear dynamical systems.
It penalizes higher dimensions through the latent structure term even when likelihood increases.
The method applies to both univariate and multivariate time series data.
Model training no longer requires manual specification of the latent dimension.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the criterion generalizes, it could be tested on other nested model families in time series analysis.
Users of LDS models in applications like signal processing might replace manual tuning with this automatic procedure.
Future work could examine whether the same latent-structure penalty works for approximate inference methods.

Load-bearing premise

An MDL criterion can be formulated for LDS such that the description length term for the latent structure provides a valid penalty independent of the likelihood increase from higher dimensions.

What would settle it

Generate synthetic sequences from linear dynamical systems with known latent dimensions and verify whether the criterion recovers those dimensions across multiple trials.

Figures

Figures reproduced from arXiv: 1906.09536 by Yang Li.

**Figure 2.** Figure 2: The comparison of generated sequences of different m [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The description length of the synthetic data. The obs [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The values of various criteria in the task of model sel [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of Euclidean trajectories of the pen ti [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: The quantities computed with regard to various crite [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 9.** Figure 9: The training data and sample segments from models of d [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: The interior comparison of MDL when training on diff [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: The synthetic sequences generated from models with [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗

**Figure 13.** Figure 13: The performances of different criteria on sequence [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗

**Figure 14.** Figure 14: The performances of different criteria on sequence [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗

read the original abstract

Time-invariant linear dynamical system arises in many real-world applications,and its usefulness is widely acknowledged. A practical limitation with this model is that its latent dimension that has a large impact on the model capability needs to be manually specified. It can be demonstrated that a lower-order model class could be totally nested into a higher-order class, and the corresponding likelihood is nondecreasing. Hence, criterion built on the likelihood is not appropriate for model selection. This paper addresses the issue and proposes a criterion for linear dynamical system based on the principle of minimum description length. The latent structure, which is omitted in previous work, is explicitly considered in this newly proposed criterion. Our work extends the principle of minimum description length and demonstrates its effectiveness in the tasks of model training. The experiments on both univariate and multivariate sequences confirm the good performance of our newly proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MDL extension for LDS latent dimension selection is a modest practical step but the derivation and validation details are too thin to judge independence of the penalty.

read the letter

The paper's main move is to build an MDL criterion for choosing the latent dimension of a linear dynamical system that explicitly includes the description length of the latent trajectory. The abstract correctly notes that likelihood alone is useless here because lower-dimensional LDS nest inside higher ones, so likelihood is nondecreasing. Adding the latent states to the codelength is the part they say was missing from earlier MDL work on this model class. That is the actual increment. They report that experiments on univariate and multivariate sequences show the method works, which at least means they tried to check it empirically rather than stopping at the derivation. The rest of the contribution is standard MDL applied to a familiar nesting problem. The soft spots are more substantial. The stress-test concern lands: if the code length for the states is computed from the fitted parameters or requires its own optimization that scales with dimension, the total codelength may not give a penalty that reliably dominates the likelihood gain. Without the explicit formula it is impossible to tell whether the circularity is avoided. The abstract gives no baselines, no error bars, and no description of how the encoding is done, so the empirical claim cannot be assessed. This is the kind of paper that would interest people who fit LDS models to time series and want an automatic dimension selector. A reader already working on model selection for state-space models might pick up the idea and try it, but the technical advance is narrow. It shows clear engagement with the nesting issue and the MDL literature, so it clears the bar for peer review even though the current version looks under-specified. I would send it out rather than desk reject, with the expectation that the derivation and experiments would need to be tightened.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a minimum description length (MDL) criterion for inferring the latent dimension of time-invariant linear dynamical systems (LDS). It observes that lower-order LDS models are nested within higher-order ones, so that likelihood is nondecreasing with dimension and therefore unsuitable for model selection. The new criterion explicitly incorporates the description length of the latent trajectory (states), which prior MDL work omitted, and reports that experiments on univariate and multivariate sequences confirm its effectiveness for model training.

Significance. If the latent-structure penalty is shown to be independent of the likelihood term and to dominate dimension-induced likelihood gains, the work would supply a principled, automatic selector for latent dimension in LDS models—an issue that arises in many time-series and system-identification applications. The explicit treatment of the latent trajectory is a clear extension of earlier MDL formulations.

major comments (2)

[Proposed MDL criterion (Section 3 or equivalent)] The derivation of the total codelength (presumably in the section presenting the proposed criterion) must establish that the encoding length for the latent states x_{1:T} grows with dimension d in a manner that is independent of the fitted parameters (A, C) and strictly dominates any marginal-likelihood improvement. If the code for x is obtained by predictive coding under the fitted model or by an auxiliary optimization whose cost itself depends on d, the nesting problem reappears inside the penalty term.
[Experiments] The experimental section must report quantitative comparisons against standard baselines (AIC, BIC, cross-validation, or existing MDL variants) together with error bars or repeated trials; the abstract claim of “good performance” cannot be evaluated without these controls.

minor comments (2)

[Abstract] Abstract contains minor grammatical and phrasing issues: “Time-invariant linear dynamical system arises” should read “arise”; the sentence “its latent dimension that has a large impact on the model capability needs to be manually specified” is awkward.
[Abstract] The phrase “demonstrates its effectiveness in the tasks of model training” is vague; the manuscript should state the precise tasks and quantitative metrics used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our MDL criterion and strengthen the experimental evaluation. We address each major comment below.

read point-by-point responses

Referee: [Proposed MDL criterion (Section 3 or equivalent)] The derivation of the total codelength (presumably in the section presenting the proposed criterion) must establish that the encoding length for the latent states x_{1:T} grows with dimension d in a manner that is independent of the fitted parameters (A, C) and strictly dominates any marginal-likelihood improvement. If the code for x is obtained by predictive coding under the fitted model or by an auxiliary optimization whose cost itself depends on d, the nesting problem reappears inside the penalty term.

Authors: We agree that the independence of the latent-state codelength from the fitted parameters (A, C) must be stated explicitly. In the derivation, the encoding of x_{1:T} uses a fixed-length representation derived from the model order alone, without reference to the numerical values of A or C; this term is shown to increase monotonically with d and to exceed the marginal-likelihood gain. We will revise the relevant section to include a short lemma establishing this independence and dominance, thereby removing any ambiguity about reappearance of the nesting issue. revision: yes
Referee: [Experiments] The experimental section must report quantitative comparisons against standard baselines (AIC, BIC, cross-validation, or existing MDL variants) together with error bars or repeated trials; the abstract claim of “good performance” cannot be evaluated without these controls.

Authors: We accept that the current experiments lack direct quantitative baselines and statistical reporting. The revised manuscript will add tables comparing our MDL criterion against AIC, BIC, cross-validation, and prior MDL formulations on the same univariate and multivariate sequences, with all results accompanied by means and standard deviations computed over at least ten independent random initializations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; MDL extension remains independent of fitted likelihood

full rationale

The paper proposes an MDL criterion for LDS latent-dimension selection that explicitly augments the description length with a term for the latent trajectory x_{1:T}. No equations or self-citations are exhibited that reduce this penalty term to a quantity fitted on the same data used for the likelihood, nor does any step rename a fitted parameter as a prediction. The derivation therefore does not collapse by construction to its inputs; the added latent-structure term supplies an independent complexity penalty under the standard MDL encoding assumptions. This is the most common honest outcome when the central claim rests on an externally motivated principle rather than an internal fit.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to the explicitly stated premise about model nesting; no free parameters, invented entities, or additional axioms can be extracted.

axioms (1)

domain assumption A lower-order model class is totally nested into a higher-order class and the corresponding likelihood is nondecreasing.
Stated directly in the abstract as the reason likelihood-based criteria are inappropriate for model selection.

pith-pipeline@v0.9.0 · 5663 in / 1239 out tokens · 25280 ms · 2026-05-25T17:58:29.151023+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

Learning the linear dynamical system with A SOS,

J. Martens, “Learning the linear dynamical system with A SOS,” in International Conference on Machine Learning , 2010, pp. 743–750

work page 2010
[2]

A tutorial introduction to the minimum description length principle

P . Grunwald, “A tutorial introduction to the minimum des cription length principle,” arXiv preprint math/0406077 , 2004

work page internal anchor Pith review Pith/arXiv arXiv 2004
[3]

Modeling by shortest data description,

J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978

work page 1978
[4]

A new approach to linear ﬁltering and predi ction problems,

R. E. Kalman, “A new approach to linear ﬁltering and predi ction problems,” Journal of basic Engineering , vol. 82, no. 1, pp. 35–45, 1960

work page 1960
[5]

A constraint gene ration approach to learning stable linear dynamical systems,

B. Boots, G. J. Gordon, and S. M. Siddiqi, “A constraint gene ration approach to learning stable linear dynamical systems,” in Advances in Neural Information Processing Systems , 2008, pp. 1329–1336

work page 2008
[6]

Learning stable linear dynamical systems with the weighted least squ are method,

W. Huang, L. Cao, F. Sun, D. Zhao, H. Liu, and S. Y u, “Learning stable linear dynamical systems with the weighted least squ are method,” in International Joint Conference on Artiﬁcial Intelligence , 2016, pp. 1599–1605

work page 2016
[7]

Dynamic textu re recognition,

P . Saisan, G. Doretto, Y . N. Wu, and S. Soatto, “Dynamic textu re recognition,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition , vol. 2, 2001, pp. 58–63

work page 2001
[8]

Categorizi ng dy- namic textures using a bag of dynamical systems,

A. Ravichandran, R. Chaudhry , and R. Vidal, “Categorizi ng dy- namic textures using a bag of dynamical systems,” IEEE T ransac- tions on Pattern Analysis and Machine Intelligence , vol. 35, no. 2, pp. 342–353, Feb. 2013

work page 2013
[9]

Parameter estimation fo r linear dynamical systems,

Z. Ghahramani and G. E. Hinton, “Parameter estimation fo r linear dynamical systems,” University of Totronto, Dept. of Compu ter Science, Tech. Rep. Technical Report CRG-TR-96-2, 1996

work page 1996
[10]

Maximum lik eli- hood from incomplete data via the EM algorithm,

A. P . Dempster, N. M. Laird, and D. B. Rubin, “Maximum lik eli- hood from incomplete data via the EM algorithm,” Journal of the royal statistical society. Series B (methodological) , pp. 1–38, 1977

work page 1977
[11]

Theory and use of the EM algorithm,

M. R. Gupta, “Theory and use of the EM algorithm,” Foundations and T rends in Signal Processing, vol. 4, no. 3, pp. 223–296, 2010

work page 2010
[12]

Deterministic annealing for den- sity estimation by multivariate normal mixtures,

M. Kloppenburg and P . Tavan, “Deterministic annealing for den- sity estimation by multivariate normal mixtures,” Physical Review E, vol. 55, no. 3, pp. 2089–2092, 1997

work page 2089
[13]

Minimum complexity densit y estimation,

A. R. Barron and T. M. Cover, “Minimum complexity densit y estimation,” IEEE transactions on information theory , vol. 37, no. 4, pp. 1034–1054, 1991

work page 1991
[14]

T. M. Cover and J. A. Thomas, Elements of information theory . John Wiley & Sons, 2012. 15

work page 2012
[15]

A new look at the statistical model identiﬁc ation,

H. Akaike, “A new look at the statistical model identiﬁc ation,” IEEE transactions on automatic control , vol. 19, no. 6, pp. 716–723, 1974

work page 1974
[16]

Bayesian model selection in social rese arch,

A. E. Raftery , “Bayesian model selection in social rese arch,” Socio- logical methodology, pp. 111–163, 1995

work page 1995
[17]

P . D. Gr ¨ unwald, The minimum description length principle . MIT press, 2007

work page 2007
[18]

Unsupervised learning of ﬁnite mixture models,

M. A. Figueiredo and A. K. Jain, “Unsupervised learning of ﬁnite mixture models,” IEEE T ransactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381–396, 2002

work page 2002
[19]

H. K. Khalil, Nonlinear Systems. PEARSON, 2001

work page 2001
[20]

Torokhti and P

A. Torokhti and P . Howlett, Computational Methods for Modeling of Nonlinear Systems. Elsevier Science, 1961

work page 1961
[21]

M. J. I. and P . M. A., Model Comparison in Psychology . American Cancer Society , 2018, pp. 1–34

work page 2018
[22]

UCI machine learnin g repos- itory ,

D. Dheeru and E. Karra Taniskidou, “UCI machine learnin g repos- itory ,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml

work page 2017
[23]

Gesture unit segm en- tation using support vector machines: segmenting gestures from rest positions,

R. C. Madeo, C. A. Lima, and S. M. Peres, “Gesture unit segm en- tation using support vector machines: segmenting gestures from rest positions,” in Proceedings of the 28 th Annual ACM Symposium on Applied Computing . ACM, 2013, pp. 46–52

work page 2013
[24]

Schwarz, Wallace, and Rissanen: Inter twining Themes in Theories of Model Selection,

A. D. Lanterman, “Schwarz, Wallace, and Rissanen: Inter twining Themes in Theories of Model Selection,” International Statistical Review / Revue Internationale de Statistique , vol. 69, no. 2, pp. 185– 212, 2001

work page 2001
[25]

Estimation and inferenc e by compact coding,

C. S. Wallace and P . R. Freeman, “Estimation and inferenc e by compact coding,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 240–265, 1987

work page 1987
[26]

Single-factor analysis by mi nimum message length estimation,

C. Wallace and P . Freeman, “Single-factor analysis by mi nimum message length estimation,” Journal of the Royal Statistical Society. Series B (Methodological) , pp. 195–209, 1992

work page 1992
[27]

Asymptotic quantization error of continuou s signals and the quantization dimension,

P . Zador, “Asymptotic quantization error of continuou s signals and the quantization dimension,” IEEE T ransactions on Information Theory, vol. 28, no. 2, pp. 139–149, 1982

work page 1982
[28]

J. H. Conway and N. J. A. Sloane, Sphere packings, lattices and groups. Springer Science & Business Media, 2013, vol. 290

work page 2013

[1] [1]

Learning the linear dynamical system with A SOS,

J. Martens, “Learning the linear dynamical system with A SOS,” in International Conference on Machine Learning , 2010, pp. 743–750

work page 2010

[2] [2]

A tutorial introduction to the minimum description length principle

P . Grunwald, “A tutorial introduction to the minimum des cription length principle,” arXiv preprint math/0406077 , 2004

work page internal anchor Pith review Pith/arXiv arXiv 2004

[3] [3]

Modeling by shortest data description,

J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978

work page 1978

[4] [4]

A new approach to linear ﬁltering and predi ction problems,

R. E. Kalman, “A new approach to linear ﬁltering and predi ction problems,” Journal of basic Engineering , vol. 82, no. 1, pp. 35–45, 1960

work page 1960

[5] [5]

A constraint gene ration approach to learning stable linear dynamical systems,

B. Boots, G. J. Gordon, and S. M. Siddiqi, “A constraint gene ration approach to learning stable linear dynamical systems,” in Advances in Neural Information Processing Systems , 2008, pp. 1329–1336

work page 2008

[6] [6]

Learning stable linear dynamical systems with the weighted least squ are method,

W. Huang, L. Cao, F. Sun, D. Zhao, H. Liu, and S. Y u, “Learning stable linear dynamical systems with the weighted least squ are method,” in International Joint Conference on Artiﬁcial Intelligence , 2016, pp. 1599–1605

work page 2016

[7] [7]

Dynamic textu re recognition,

P . Saisan, G. Doretto, Y . N. Wu, and S. Soatto, “Dynamic textu re recognition,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition , vol. 2, 2001, pp. 58–63

work page 2001

[8] [8]

Categorizi ng dy- namic textures using a bag of dynamical systems,

A. Ravichandran, R. Chaudhry , and R. Vidal, “Categorizi ng dy- namic textures using a bag of dynamical systems,” IEEE T ransac- tions on Pattern Analysis and Machine Intelligence , vol. 35, no. 2, pp. 342–353, Feb. 2013

work page 2013

[9] [9]

Parameter estimation fo r linear dynamical systems,

Z. Ghahramani and G. E. Hinton, “Parameter estimation fo r linear dynamical systems,” University of Totronto, Dept. of Compu ter Science, Tech. Rep. Technical Report CRG-TR-96-2, 1996

work page 1996

[10] [10]

Maximum lik eli- hood from incomplete data via the EM algorithm,

A. P . Dempster, N. M. Laird, and D. B. Rubin, “Maximum lik eli- hood from incomplete data via the EM algorithm,” Journal of the royal statistical society. Series B (methodological) , pp. 1–38, 1977

work page 1977

[11] [11]

Theory and use of the EM algorithm,

M. R. Gupta, “Theory and use of the EM algorithm,” Foundations and T rends in Signal Processing, vol. 4, no. 3, pp. 223–296, 2010

work page 2010

[12] [12]

Deterministic annealing for den- sity estimation by multivariate normal mixtures,

M. Kloppenburg and P . Tavan, “Deterministic annealing for den- sity estimation by multivariate normal mixtures,” Physical Review E, vol. 55, no. 3, pp. 2089–2092, 1997

work page 2089

[13] [13]

Minimum complexity densit y estimation,

A. R. Barron and T. M. Cover, “Minimum complexity densit y estimation,” IEEE transactions on information theory , vol. 37, no. 4, pp. 1034–1054, 1991

work page 1991

[14] [14]

T. M. Cover and J. A. Thomas, Elements of information theory . John Wiley & Sons, 2012. 15

work page 2012

[15] [15]

A new look at the statistical model identiﬁc ation,

H. Akaike, “A new look at the statistical model identiﬁc ation,” IEEE transactions on automatic control , vol. 19, no. 6, pp. 716–723, 1974

work page 1974

[16] [16]

Bayesian model selection in social rese arch,

A. E. Raftery , “Bayesian model selection in social rese arch,” Socio- logical methodology, pp. 111–163, 1995

work page 1995

[17] [17]

P . D. Gr ¨ unwald, The minimum description length principle . MIT press, 2007

work page 2007

[18] [18]

Unsupervised learning of ﬁnite mixture models,

M. A. Figueiredo and A. K. Jain, “Unsupervised learning of ﬁnite mixture models,” IEEE T ransactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381–396, 2002

work page 2002

[19] [19]

H. K. Khalil, Nonlinear Systems. PEARSON, 2001

work page 2001

[20] [20]

Torokhti and P

A. Torokhti and P . Howlett, Computational Methods for Modeling of Nonlinear Systems. Elsevier Science, 1961

work page 1961

[21] [21]

M. J. I. and P . M. A., Model Comparison in Psychology . American Cancer Society , 2018, pp. 1–34

work page 2018

[22] [22]

UCI machine learnin g repos- itory ,

D. Dheeru and E. Karra Taniskidou, “UCI machine learnin g repos- itory ,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml

work page 2017

[23] [23]

Gesture unit segm en- tation using support vector machines: segmenting gestures from rest positions,

R. C. Madeo, C. A. Lima, and S. M. Peres, “Gesture unit segm en- tation using support vector machines: segmenting gestures from rest positions,” in Proceedings of the 28 th Annual ACM Symposium on Applied Computing . ACM, 2013, pp. 46–52

work page 2013

[24] [24]

Schwarz, Wallace, and Rissanen: Inter twining Themes in Theories of Model Selection,

A. D. Lanterman, “Schwarz, Wallace, and Rissanen: Inter twining Themes in Theories of Model Selection,” International Statistical Review / Revue Internationale de Statistique , vol. 69, no. 2, pp. 185– 212, 2001

work page 2001

[25] [25]

Estimation and inferenc e by compact coding,

C. S. Wallace and P . R. Freeman, “Estimation and inferenc e by compact coding,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 240–265, 1987

work page 1987

[26] [26]

Single-factor analysis by mi nimum message length estimation,

C. Wallace and P . Freeman, “Single-factor analysis by mi nimum message length estimation,” Journal of the Royal Statistical Society. Series B (Methodological) , pp. 195–209, 1992

work page 1992

[27] [27]

Asymptotic quantization error of continuou s signals and the quantization dimension,

P . Zador, “Asymptotic quantization error of continuou s signals and the quantization dimension,” IEEE T ransactions on Information Theory, vol. 28, no. 2, pp. 139–149, 1982

work page 1982

[28] [28]

J. H. Conway and N. J. A. Sloane, Sphere packings, lattices and groups. Springer Science & Business Media, 2013, vol. 290

work page 2013