Inferring Latent dimension of Linear Dynamical System with Minimum Description Length
Pith reviewed 2026-05-25 17:58 UTC · model grok-4.3
The pith
A minimum description length criterion that accounts for latent structure selects the dimension of linear dynamical systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The newly proposed MDL criterion for linear dynamical systems, which explicitly considers the latent structure, extends the minimum description length principle and demonstrates its effectiveness in the tasks of model training.
What carries the argument
The MDL criterion that includes the description length of the latent structure in addition to the model parameters and data.
If this is right
- The criterion allows automatic selection of latent dimension during training of linear dynamical systems.
- It penalizes higher dimensions through the latent structure term even when likelihood increases.
- The method applies to both univariate and multivariate time series data.
- Model training no longer requires manual specification of the latent dimension.
Where Pith is reading between the lines
- If the criterion generalizes, it could be tested on other nested model families in time series analysis.
- Users of LDS models in applications like signal processing might replace manual tuning with this automatic procedure.
- Future work could examine whether the same latent-structure penalty works for approximate inference methods.
Load-bearing premise
An MDL criterion can be formulated for LDS such that the description length term for the latent structure provides a valid penalty independent of the likelihood increase from higher dimensions.
What would settle it
Generate synthetic sequences from linear dynamical systems with known latent dimensions and verify whether the criterion recovers those dimensions across multiple trials.
Figures
read the original abstract
Time-invariant linear dynamical system arises in many real-world applications,and its usefulness is widely acknowledged. A practical limitation with this model is that its latent dimension that has a large impact on the model capability needs to be manually specified. It can be demonstrated that a lower-order model class could be totally nested into a higher-order class, and the corresponding likelihood is nondecreasing. Hence, criterion built on the likelihood is not appropriate for model selection. This paper addresses the issue and proposes a criterion for linear dynamical system based on the principle of minimum description length. The latent structure, which is omitted in previous work, is explicitly considered in this newly proposed criterion. Our work extends the principle of minimum description length and demonstrates its effectiveness in the tasks of model training. The experiments on both univariate and multivariate sequences confirm the good performance of our newly proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a minimum description length (MDL) criterion for inferring the latent dimension of time-invariant linear dynamical systems (LDS). It observes that lower-order LDS models are nested within higher-order ones, so that likelihood is nondecreasing with dimension and therefore unsuitable for model selection. The new criterion explicitly incorporates the description length of the latent trajectory (states), which prior MDL work omitted, and reports that experiments on univariate and multivariate sequences confirm its effectiveness for model training.
Significance. If the latent-structure penalty is shown to be independent of the likelihood term and to dominate dimension-induced likelihood gains, the work would supply a principled, automatic selector for latent dimension in LDS models—an issue that arises in many time-series and system-identification applications. The explicit treatment of the latent trajectory is a clear extension of earlier MDL formulations.
major comments (2)
- [Proposed MDL criterion (Section 3 or equivalent)] The derivation of the total codelength (presumably in the section presenting the proposed criterion) must establish that the encoding length for the latent states x_{1:T} grows with dimension d in a manner that is independent of the fitted parameters (A, C) and strictly dominates any marginal-likelihood improvement. If the code for x is obtained by predictive coding under the fitted model or by an auxiliary optimization whose cost itself depends on d, the nesting problem reappears inside the penalty term.
- [Experiments] The experimental section must report quantitative comparisons against standard baselines (AIC, BIC, cross-validation, or existing MDL variants) together with error bars or repeated trials; the abstract claim of “good performance” cannot be evaluated without these controls.
minor comments (2)
- [Abstract] Abstract contains minor grammatical and phrasing issues: “Time-invariant linear dynamical system arises” should read “arise”; the sentence “its latent dimension that has a large impact on the model capability needs to be manually specified” is awkward.
- [Abstract] The phrase “demonstrates its effectiveness in the tasks of model training” is vague; the manuscript should state the precise tasks and quantitative metrics used.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our MDL criterion and strengthen the experimental evaluation. We address each major comment below.
read point-by-point responses
-
Referee: [Proposed MDL criterion (Section 3 or equivalent)] The derivation of the total codelength (presumably in the section presenting the proposed criterion) must establish that the encoding length for the latent states x_{1:T} grows with dimension d in a manner that is independent of the fitted parameters (A, C) and strictly dominates any marginal-likelihood improvement. If the code for x is obtained by predictive coding under the fitted model or by an auxiliary optimization whose cost itself depends on d, the nesting problem reappears inside the penalty term.
Authors: We agree that the independence of the latent-state codelength from the fitted parameters (A, C) must be stated explicitly. In the derivation, the encoding of x_{1:T} uses a fixed-length representation derived from the model order alone, without reference to the numerical values of A or C; this term is shown to increase monotonically with d and to exceed the marginal-likelihood gain. We will revise the relevant section to include a short lemma establishing this independence and dominance, thereby removing any ambiguity about reappearance of the nesting issue. revision: yes
-
Referee: [Experiments] The experimental section must report quantitative comparisons against standard baselines (AIC, BIC, cross-validation, or existing MDL variants) together with error bars or repeated trials; the abstract claim of “good performance” cannot be evaluated without these controls.
Authors: We accept that the current experiments lack direct quantitative baselines and statistical reporting. The revised manuscript will add tables comparing our MDL criterion against AIC, BIC, cross-validation, and prior MDL formulations on the same univariate and multivariate sequences, with all results accompanied by means and standard deviations computed over at least ten independent random initializations. revision: yes
Circularity Check
No significant circularity; MDL extension remains independent of fitted likelihood
full rationale
The paper proposes an MDL criterion for LDS latent-dimension selection that explicitly augments the description length with a term for the latent trajectory x_{1:T}. No equations or self-citations are exhibited that reduce this penalty term to a quantity fitted on the same data used for the likelihood, nor does any step rename a fitted parameter as a prediction. The derivation therefore does not collapse by construction to its inputs; the added latent-structure term supplies an independent complexity penalty under the standard MDL encoding assumptions. This is the most common honest outcome when the central claim rests on an externally motivated principle rather than an internal fit.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A lower-order model class is totally nested into a higher-order class and the corresponding likelihood is nondecreasing.
Reference graph
Works this paper leans on
-
[1]
Learning the linear dynamical system with A SOS,
J. Martens, “Learning the linear dynamical system with A SOS,” in International Conference on Machine Learning , 2010, pp. 743–750
work page 2010
-
[2]
A tutorial introduction to the minimum description length principle
P . Grunwald, “A tutorial introduction to the minimum des cription length principle,” arXiv preprint math/0406077 , 2004
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[3]
Modeling by shortest data description,
J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978
work page 1978
-
[4]
A new approach to linear filtering and predi ction problems,
R. E. Kalman, “A new approach to linear filtering and predi ction problems,” Journal of basic Engineering , vol. 82, no. 1, pp. 35–45, 1960
work page 1960
-
[5]
A constraint gene ration approach to learning stable linear dynamical systems,
B. Boots, G. J. Gordon, and S. M. Siddiqi, “A constraint gene ration approach to learning stable linear dynamical systems,” in Advances in Neural Information Processing Systems , 2008, pp. 1329–1336
work page 2008
-
[6]
Learning stable linear dynamical systems with the weighted least squ are method,
W. Huang, L. Cao, F. Sun, D. Zhao, H. Liu, and S. Y u, “Learning stable linear dynamical systems with the weighted least squ are method,” in International Joint Conference on Artificial Intelligence , 2016, pp. 1599–1605
work page 2016
-
[7]
P . Saisan, G. Doretto, Y . N. Wu, and S. Soatto, “Dynamic textu re recognition,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition , vol. 2, 2001, pp. 58–63
work page 2001
-
[8]
Categorizi ng dy- namic textures using a bag of dynamical systems,
A. Ravichandran, R. Chaudhry , and R. Vidal, “Categorizi ng dy- namic textures using a bag of dynamical systems,” IEEE T ransac- tions on Pattern Analysis and Machine Intelligence , vol. 35, no. 2, pp. 342–353, Feb. 2013
work page 2013
-
[9]
Parameter estimation fo r linear dynamical systems,
Z. Ghahramani and G. E. Hinton, “Parameter estimation fo r linear dynamical systems,” University of Totronto, Dept. of Compu ter Science, Tech. Rep. Technical Report CRG-TR-96-2, 1996
work page 1996
-
[10]
Maximum lik eli- hood from incomplete data via the EM algorithm,
A. P . Dempster, N. M. Laird, and D. B. Rubin, “Maximum lik eli- hood from incomplete data via the EM algorithm,” Journal of the royal statistical society. Series B (methodological) , pp. 1–38, 1977
work page 1977
-
[11]
Theory and use of the EM algorithm,
M. R. Gupta, “Theory and use of the EM algorithm,” Foundations and T rends in Signal Processing, vol. 4, no. 3, pp. 223–296, 2010
work page 2010
-
[12]
Deterministic annealing for den- sity estimation by multivariate normal mixtures,
M. Kloppenburg and P . Tavan, “Deterministic annealing for den- sity estimation by multivariate normal mixtures,” Physical Review E, vol. 55, no. 3, pp. 2089–2092, 1997
work page 2089
-
[13]
Minimum complexity densit y estimation,
A. R. Barron and T. M. Cover, “Minimum complexity densit y estimation,” IEEE transactions on information theory , vol. 37, no. 4, pp. 1034–1054, 1991
work page 1991
-
[14]
T. M. Cover and J. A. Thomas, Elements of information theory . John Wiley & Sons, 2012. 15
work page 2012
-
[15]
A new look at the statistical model identific ation,
H. Akaike, “A new look at the statistical model identific ation,” IEEE transactions on automatic control , vol. 19, no. 6, pp. 716–723, 1974
work page 1974
-
[16]
Bayesian model selection in social rese arch,
A. E. Raftery , “Bayesian model selection in social rese arch,” Socio- logical methodology, pp. 111–163, 1995
work page 1995
-
[17]
P . D. Gr ¨ unwald, The minimum description length principle . MIT press, 2007
work page 2007
-
[18]
Unsupervised learning of finite mixture models,
M. A. Figueiredo and A. K. Jain, “Unsupervised learning of finite mixture models,” IEEE T ransactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381–396, 2002
work page 2002
-
[19]
H. K. Khalil, Nonlinear Systems. PEARSON, 2001
work page 2001
-
[20]
A. Torokhti and P . Howlett, Computational Methods for Modeling of Nonlinear Systems. Elsevier Science, 1961
work page 1961
-
[21]
M. J. I. and P . M. A., Model Comparison in Psychology . American Cancer Society , 2018, pp. 1–34
work page 2018
-
[22]
UCI machine learnin g repos- itory ,
D. Dheeru and E. Karra Taniskidou, “UCI machine learnin g repos- itory ,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
work page 2017
-
[23]
Gesture unit segm en- tation using support vector machines: segmenting gestures from rest positions,
R. C. Madeo, C. A. Lima, and S. M. Peres, “Gesture unit segm en- tation using support vector machines: segmenting gestures from rest positions,” in Proceedings of the 28 th Annual ACM Symposium on Applied Computing . ACM, 2013, pp. 46–52
work page 2013
-
[24]
Schwarz, Wallace, and Rissanen: Inter twining Themes in Theories of Model Selection,
A. D. Lanterman, “Schwarz, Wallace, and Rissanen: Inter twining Themes in Theories of Model Selection,” International Statistical Review / Revue Internationale de Statistique , vol. 69, no. 2, pp. 185– 212, 2001
work page 2001
-
[25]
Estimation and inferenc e by compact coding,
C. S. Wallace and P . R. Freeman, “Estimation and inferenc e by compact coding,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 240–265, 1987
work page 1987
-
[26]
Single-factor analysis by mi nimum message length estimation,
C. Wallace and P . Freeman, “Single-factor analysis by mi nimum message length estimation,” Journal of the Royal Statistical Society. Series B (Methodological) , pp. 195–209, 1992
work page 1992
-
[27]
Asymptotic quantization error of continuou s signals and the quantization dimension,
P . Zador, “Asymptotic quantization error of continuou s signals and the quantization dimension,” IEEE T ransactions on Information Theory, vol. 28, no. 2, pp. 139–149, 1982
work page 1982
-
[28]
J. H. Conway and N. J. A. Sloane, Sphere packings, lattices and groups. Springer Science & Business Media, 2013, vol. 290
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.