pith. sign in

arxiv: 1906.09920 · v1 · pith:TGFRKMPDnew · submitted 2019-06-24 · 📡 eess.SP

Online Variational Bayesian Subspace Filtering with Applications

Pith reviewed 2026-05-25 17:27 UTC · model grok-4.3

classification 📡 eess.SP
keywords variational Bayesian inferencesubspace trackingautomatic relevance determinationmatrix completiontime series imputationonline filteringoutlier rejectionstate space models
0
0 comments X

The pith

A variational Bayesian method learns time-varying low-dimensional subspaces from sequential data without tuning rank or noise power.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops generative Bayesian models for sequential multivariate measurements that come from a low-dimensional time-varying subspace. It introduces a variational Bayesian subspace filtering procedure that jointly infers the subspace, its state-transition matrix, and the relevant dimensions via automatic relevance determination priors. Because these priors automatically prune irrelevant components, the method removes the need to preset the subspace rank or noise variance. A forward-backward message-passing algorithm keeps the per-step cost linear in the data dimensions. Experiments on traffic and electricity time series show improved imputation, outlier removal, and future prediction compared with deterministic matrix and tensor completion baselines.

Core claim

The paper claims that a variational Bayesian filter, equipped with automatic relevance determination priors on the subspace basis and transition matrix, can recover the underlying time-varying subspace and perform imputation, outlier rejection, and prediction without requiring the user to specify the rank or noise power in advance.

What carries the argument

Variational Bayesian inference with automatic relevance determination priors placed on the columns of the subspace matrix and on the entries of the state-transition matrix.

If this is right

  • The learned state-transition matrix directly supplies one-step-ahead predictions of the next subspace state.
  • Outlier rejection occurs automatically because the noise model is inferred jointly with the subspace.
  • The forward-backward algorithm enables constant-memory online processing of streaming multivariate observations.
  • Missing entries are imputed by marginalizing over the posterior of the current subspace coefficients.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same ARD mechanism could be inserted into other online subspace trackers to remove manual rank selection in streaming settings.
  • Because the model is fully generative, it can be extended to handle non-Gaussian noise by changing only the likelihood term while retaining the same variational updates.
  • The low-complexity forward-backward schedule suggests the method remains tractable when the ambient dimension grows to thousands, provided the effective rank stays small.

Load-bearing premise

The observed sequential measurements are generated by a low-dimensional subspace whose evolution over time is governed by a linear state-transition matrix.

What would settle it

On synthetic data generated from a known low-rank linear dynamical system, the variational updates would fail to recover the correct number of active dimensions or would produce worse imputation error than a correctly tuned deterministic baseline.

Figures

Figures reproduced from arXiv: 1906.09920 by Charul, Ketan Rajawat, Pravesh Biyani, Uttkarsha Bhatt.

Figure 1
Figure 1. Figure 1: Online Variational Bayesian Filtering A. Hierarchical Bayesian Model We begin with detailing a generative model for the matrix Y. The proposed model will not only capture the rank deficient nature of Y [31] but also the temporal correlation between successive columns of Y [32]. Recall that the standard low￾rank parametrization of the full matrix Y takes the form Y = AB where A ∈ R m×r and B ∈ R r×t . Class… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Hierarchical Bayesian Model for Matrix Completion [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Region where traffic data is collected [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Map with red as missing and blue as known traffic [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Estimation of traffic data for different percentage of missing entries [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Real time Traffic Estimation and Prediction for different missing entries [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison between VBSF and Low rank Tensor Completion (LRTC) and Matrix Completion Algorithm (GROUSE) [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Robust Bayesian subspace filtering for traffic data [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: To analyze the performance of our algorithm we [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 9
Figure 9. Figure 9: One-step ahead electricity prediction temporal evolution of the underlying low-rank subspace is characterized via a state-space model and low-complexity vari￾ational Bayesian subspace filtering algorithms are proposed for matrix completion and outlier removal tasks. Simulation experiments quantify that the suggested model can be deployed to estimate the missing traffic data with a reasonable accuracy even … view at source ↗
read the original abstract

Matrix completion and robust principal component analysis have been widely used for the recovery of data suffering from missing entries or outliers. In many real-world applications however, the data is also time-varying, and the naive approach of per-snapshot recovery is both expensive and sub-optimal. This paper develops generative Bayesian models that fit sequential multivariate measurements arising from a low-dimensional time-varying subspace. A variational Bayesian subspace filtering approach is proposed that learns the underlying subspace and its state-transition matrix. Different from the plethora of deterministic counterparts, the proposed approach utilizes automatic relevance determination priors that obviate the need to tune key parameters such as rank and noise power. We also propose a forward-backward algorithm that allows the updates to be carried out at low complexity. Extensive tests over traffic and electricity data demonstrate the superior imputation, outlier rejection, and temporal prediction prowess of the proposed algorithm over the state-of-the-art matrix/tensor completion algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes generative Bayesian models for sequential multivariate data arising from a low-dimensional time-varying subspace. It introduces a variational Bayesian subspace filtering algorithm that jointly learns the subspace and its state-transition matrix, employing automatic relevance determination (ARD) priors to eliminate the need to tune rank and noise power. A forward-backward algorithm is derived for efficient updates. Experiments on traffic and electricity datasets claim superior imputation, outlier rejection, and temporal prediction relative to state-of-the-art matrix and tensor completion methods.

Significance. If the performance claims hold under the stated generative model, the work provides a parameter-free Bayesian alternative to deterministic subspace tracking methods, with potential utility in online robust PCA and matrix completion for dynamic sensor or network data. The ARD mechanism and forward-backward scheme are presented as practical advantages.

major comments (2)
  1. [§2 and Experiments section] The central modeling assumption (sequential data exactly generated by a low-rank subspace whose evolution follows a linear state-transition matrix) is stated in the abstract and §2 but receives no diagnostic verification on the traffic or electricity datasets. The reported gains could be dataset-specific artifacts rather than a general property of the VB+ARD construction; an ablation or residual analysis checking whether performance degrades when the linear-dynamics assumption is violated is needed to support the superiority claim.
  2. [Experiments section] Abstract and §4 claim superior imputation/outlier rejection without providing quantitative tables, error bars, or baseline implementation details in the visible summary; the full experimental section must include these to allow verification of the performance advantage over deterministic counterparts.
minor comments (1)
  1. [§2] Notation for the state-transition matrix and ARD hyperparameters should be introduced with explicit definitions in the model section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [§2 and Experiments section] The central modeling assumption (sequential data exactly generated by a low-rank subspace whose evolution follows a linear state-transition matrix) is stated in the abstract and §2 but receives no diagnostic verification on the traffic or electricity datasets. The reported gains could be dataset-specific artifacts rather than a general property of the VB+ARD construction; an ablation or residual analysis checking whether performance degrades when the linear-dynamics assumption is violated is needed to support the superiority claim.

    Authors: We agree that an explicit check of the linear-dynamics assumption on the real datasets would strengthen the paper. Although the ARD priors confer some robustness and the model follows standard subspace-tracking assumptions, we will add a residual-analysis subsection in the revised experiments that compares reconstruction residuals under the linear transition model against a nonparametric baseline to assess sensitivity to violations of the assumption. revision: yes

  2. Referee: [Experiments section] Abstract and §4 claim superior imputation/outlier rejection without providing quantitative tables, error bars, or baseline implementation details in the visible summary; the full experimental section must include these to allow verification of the performance advantage over deterministic counterparts.

    Authors: Section 4 of the manuscript already contains quantitative tables reporting imputation, outlier-rejection, and prediction errors together with standard deviations across repeated trials and explicit descriptions of all baseline implementations and parameter settings. We will ensure these tables and details remain prominent and will add any missing implementation hyperparameters if the referee identifies specific omissions. revision: no

Circularity Check

0 steps flagged

No circularity: derivation applies standard VB to stated generative assumption without reduction to inputs

full rationale

The paper states a generative model assumption (sequential data from low-dim time-varying subspace with linear state-transition dynamics) and applies variational Bayesian inference with ARD priors to learn subspace and transition matrix. No equations, self-citations, or fitted quantities are shown in the provided text that reduce any claimed prediction or result to the inputs by construction. The forward-backward algorithm and performance claims on traffic/electricity data are presented as consequences of the model and inference procedure, not as self-referential fits. This is the normal case of a modeling paper whose central steps remain independent of the target outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; therefore the ledger is necessarily incomplete and reflects only the high-level modeling assumptions stated in the abstract.

axioms (1)
  • domain assumption Sequential multivariate measurements arise from a low-dimensional time-varying subspace whose evolution is captured by a state-transition matrix.
    Explicitly stated in the abstract as the generative model the method is designed to fit.

pith-pipeline@v0.9.0 · 5688 in / 1057 out tokens · 28509 ms · 2026-05-25T17:27:05.098214+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    Exact matrix completion via convex optimization,

    E. J. Cand `es and B. Recht, “Exact matrix completion via convex optimization,” Found. Comp. Math., vol. 9, no. 6, pp. 717–772, 2009

  2. [2]

    Online identification and tracking of subspaces from highly incomplete information,

    L. Balzano, R. Nowak, and B. Recht, “Online identification and tracking of subspaces from highly incomplete information,” in Proc. of IEEE Allerton, Sept. 2010, pp. 704–711

  3. [4]

    Online sparse and low-rank subspace learning from incomplete data: A Bayesian view,

    P. V . Giampouras, A. A. Rontogiannis, K. E. Themelis, and K. D. Koutroumbas, “Online sparse and low-rank subspace learning from incomplete data: A Bayesian view,” Signal Processing , vol. 137, pp. 199–212, 2017

  4. [5]

    Robust principal component analysis?

    E. J. Cand `es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM , vol. 58, no. 3, pp. 11:1–11:37, 2011

  5. [6]

    Bayesian robust principal component analysis,

    X. Ding, L. He, and L. Carin, “Bayesian robust principal component analysis,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3419–3430, 2011

  6. [7]

    Tensor completion for estimating missing values in visual data,

    J. Liu, P. Musialski, P. Wonka, and J. Ye, “Tensor completion for estimating missing values in visual data,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 35, no. 1, pp. 208–220, 2013

  7. [8]

    Electricity load diagrams 2011-2014,

    UCI, “Electricity load diagrams 2011-2014,” 2014. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/ ElectricityLoadDiagrams20112014/

  8. [9]

    Online Robust Subspace Tracking from Partial Information

    J. He, L. Balzano, and J. Lui, “Online robust subspace tracking from partial information,” arXiv preprint arXiv:1109.3827 , 2011

  9. [10]

    A robust online subspace estimation and tracking algorithm,

    H. Mansour and X. Jiang, “A robust online subspace estimation and tracking algorithm,” in Proc. of the IEEE ICASSP, Apr. 2015, pp. 4065– 4069

  10. [11]

    Temporal regularized matrix factorization for high-dimensional time series prediction,

    H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal regularized matrix factorization for high-dimensional time series prediction,” in Proc. of NIPS, Barcelona, Spain., Dec. 2016, pp. 847–855

  11. [12]

    Dynamic matrix recovery from incomplete observations under an exact low-rank constraint,

    L. Xu and M. Davenport, “Dynamic matrix recovery from incomplete observations under an exact low-rank constraint,” in Proc. of NIPS , Barcelona, Spain., Dec. 2016, pp. 3585–3593

  12. [13]

    Robust PCA via outlier pursuit,

    H. Xu, C. Caramanis, and S. Sanghavi, “Robust PCA via outlier pursuit,” in Proc. of NIPS , Vancouver, Canada, Dec. 2010, pp. 2496–2504

  13. [14]

    Online forecasting matrix factorization,

    S. Gultekin and J. Paisley, “Online forecasting matrix factorization,” IEEE Trans. Signal Process. , vol. 67, no. 5, pp. 1223–1236, March. 2019

  14. [15]

    Bilinear generalized approxi- mate message passing Part I: Derivation,

    J. T. Parker, P. Schniter, and V . Cevher, “Bilinear generalized approxi- mate message passing Part I: Derivation,” IEEE Trans. Signal Process., vol. 62, no. 22, pp. 5839–5853, 2014

  15. [16]

    Bilinear generalized approximate message passing Part II: Appli- cations,

    ——, “Bilinear generalized approximate message passing Part II: Appli- cations,” IEEE Trans. Signal Process. , vol. 62, no. 22, pp. 5854–5867, 2014

  16. [17]

    Exploring algorithmic limits of matrix rank minimization under affine constraints,

    B. Xin, Y . Wang, W. Gao, and D. Wipf, “Exploring algorithmic limits of matrix rank minimization under affine constraints,” IEEE Trans. Signal Process., vol. 64, no. 19, pp. 4960–4974, 2016

  17. [18]

    Fast low-rank Bayesian matrix completion with hierarchical gaussian prior models,

    L. Yang, J. Fang, H. Duan, H. Li, and B. Zeng, “Fast low-rank Bayesian matrix completion with hierarchical gaussian prior models,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2804–2817, 2018

  18. [19]

    Matrix and tensor based methods for missing data estimation in large traffic networks,

    M. T. Asif, N. Mitrovic, J. Dauwels, and P. Jaillet, “Matrix and tensor based methods for missing data estimation in large traffic networks,” IEEE Trans. Intell. Transp. Syst. , vol. 17, no. 7, pp. 1816–1825, 2016

  19. [20]

    Fast variational Bayesian linear state-space model,

    J. Luttinen, “Fast variational Bayesian linear state-space model,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2013, pp. 305–320

  20. [21]

    Variational Bayesian matrix factorization for bounded support data,

    Z. Ma, A. E. Teschendorff, A. Leijon, Y . Qiao, H. Zhang, and J. Guo, “Variational Bayesian matrix factorization for bounded support data,” IEEE Trans. Pattern Anal. Mach. Intell , vol. 37, no. 4, pp. 876–889, 2015

  21. [22]

    Review of road traffic control strategies,

    M. Papageorgiou, C. Diakaki, V . Dinopoulou, A. Kotsialos, and Y . Wang, “Review of road traffic control strategies,” Proc. of the IEEE , vol. 91, no. 12, pp. 2043–2067, 2003

  22. [23]

    On-demand high-capacity ride-sharing via dynamic trip-vehicle assign- ment,

    J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Frazzoli, and D. Rus, “On-demand high-capacity ride-sharing via dynamic trip-vehicle assign- ment,” Proc. of the National Academy of Sciences , vol. 114, no. 3, pp. 462–467, 2017

  23. [24]

    Moving around in Indian cities,

    D. Mohan, “Moving around in Indian cities,” Economic and Political Weekly, vol. 48, no. 48, 2013

  24. [25]

    Access–egress and other travel characteristics of metro users in Delhi and its satellite cities,

    R. Goel and G. Tiwari, “Access–egress and other travel characteristics of metro users in Delhi and its satellite cities,” IATSS Research, vol. 39, no. 2, pp. 164–172, 2016

  25. [26]

    PPCA-based missing data imputation for traffic flow volume: A systematical approach,

    L. Qu, L. Li, Y . Zhang, and J. Hu, “PPCA-based missing data imputation for traffic flow volume: A systematical approach,” IEEE Trans. Intell. Transp. Syst., vol. 10, no. 3, pp. 512–522, 2009

  26. [27]

    A BPCA based missing value imputing method for traffic flow volume data,

    L. Qu, Y . Zhang, J. Hu, L. Jia, and L. Li, “A BPCA based missing value imputing method for traffic flow volume data,” in Proc. of the IEEE Symp. Intelligent Vehicles , 2008, pp. 985–990

  27. [28]

    Short-term traffic prediction based on dynamic tensor completion,

    H. Tan, Y . Wu, B. Shen, P. J. Jin, and B. Ran, “Short-term traffic prediction based on dynamic tensor completion,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 8, pp. 2123–2133, 2016

  28. [29]

    Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification,

    J. Guo, W. Huang, and B. M. Williams, “Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification,” Transportation Research Part C: Emerging Technologies, vol. 43, pp. 50–64, 2014

  29. [30]

    A short- term user load forecasting with missing data,

    Z. Zhang, G. Liang, Y .-j. Dai, X.-u. Dong, and P.-x. Wang, “A short- term user load forecasting with missing data,” in 2018 International Conference on Mechanical, Electronic and Information Technology, Apr. 2018, pp. 395–400

  30. [31]

    Sparse Bayesian methods for low-rank matrix estimation,

    S. D. Babacan, M. Luessi, R. Molina, and A. K. Katsaggelos, “Sparse Bayesian methods for low-rank matrix estimation,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 3964–3977, 2012

  31. [32]

    Low- dimensional models for missing data imputation in road networks,

    M. T. Asif, N. Mitrovic, L. Garg, J. Dauwels, and P. Jaillet, “Low- dimensional models for missing data imputation in road networks,” in Proc. of the IEEE ICASSP , May. 2013, pp. 3527–3531

  32. [33]

    C. M. Bishop, Pattern Recognition and Machine Learning . Springer, 2006

  33. [34]

    Variational algorithms for approximate Bayesian inference,

    M. J. Beal, “Variational algorithms for approximate Bayesian inference,” Ph.D. dissertation, University of London London, 2003

  34. [35]

    The variational approximation for Bayesian inference,

    D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variational approximation for Bayesian inference,” IEEE Signal Process. Mag. , vol. 25, no. 6, pp. 131–146, 2008

  35. [36]

    Online model selection based on the variational Bayes,

    M.-A. Sato, “Online model selection based on the variational Bayes,” Neural Computation, vol. 13, no. 7, pp. 1649–1681, 2001

  36. [37]

    Convergence of a block coordinate descent method for nondifferentiable minimization,

    P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of Optimization Theory and Applications, vol. 109, no. 3, pp. 475–494, 2001