Bayesian Multivariate Sparse Functional Principal Components Analysis
Pith reviewed 2026-05-18 19:05 UTC · model grok-4.3
The pith
MSFAST is a Bayesian method that estimates principal components for multivariate sparse functional data while accounting for uncertainty in the components.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MSFAST represents the principal components using orthonormal splines and samples the spline coefficients via parameter expansion in a Bayesian model tailored to multivariate sparse observations. It incorporates covariate standardization to handle scale differences, a suitable orthogonal basis, parameter updates for stability, multi-core acceleration, Procrustes alignment of posterior samples, and efficient prediction. This framework produces valid inferences that reflect uncertainty in the principal components and yields accurate estimates particularly when the signal-to-noise ratio is small.
What carries the argument
The Bayesian hierarchical model in MSFAST that explicitly represents principal components as orthonormal spline expansions and uses parameter expansion to sample from their posterior while aligning samples via Procrustes methods for the multivariate sparse setting.
Load-bearing premise
The success of the method depends on standardization of each functional covariate being sufficient to fix posterior conditioning problems caused by differing scales, together with the orthogonal spline basis and Procrustes alignment delivering stable posterior samples of the principal components.
What would settle it
A simulation experiment with known true principal components and low signal-to-noise ratio in which the MSFAST posterior credible intervals exhibit coverage substantially below the stated level would indicate that the inferences are not uniquely valid.
Figures
read the original abstract
Functional Principal Components Analysis (FPCA) provides a parsimonious, semi-parametric model for multivariate, sparsely-observed functional data. Frequentist FPCA approaches estimate principal components (PCs) from the data, then condition on these estimates in subsequent analyses. As an alternative, we propose a fully-Bayesian inferential framework for multivariate, sparse functional data (MSFAST) which explicitly models the PCs and incorporates their uncertainty. MSFAST builds upon the FAST approach to FPCA for univariate, densely-observed functional data. Like FAST, MSFAST represents PCs using orthonormal splines and samples the orthonormal spline coefficients using parameter expansion. MSFAST extends FAST to multivariate, sparsely-observed data by (1) standardizing each functional covariate to mitigate poor posterior conditioning due to disparate scales; (2) using a better-suited orthogonal spline basis; (3) updating parameterizations for computational stability; (4) introducing routines that leverage multiple cores and threads to accelerate compute; (5) using a Procrustes-based posterior PC alignment procedure; and (6) providing efficient prediction routines. We evaluate MSFAST alongside existing implementations using simulations. MSFAST produces uniquely valid inferences and accurate estimates, particularly in smaller signal-to-noise regimes. MSFAST is motivated by and applied to a study of child growth, with an accompanying vignette illustrating the implementation step-by-step.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes MSFAST, a fully Bayesian extension of the FAST method for functional principal components analysis (FPCA) applied to multivariate, sparsely observed functional data. It explicitly models the principal components (PCs) using orthonormal splines and parameter expansion, incorporates their posterior uncertainty, and introduces six extensions: per-covariate standardization, an improved orthogonal spline basis, updated parameterizations for stability, parallel computing routines, a Procrustes-based posterior alignment procedure, and efficient prediction methods. Simulations are claimed to show that MSFAST yields uniquely valid inferences and more accurate estimates than existing approaches, especially in low signal-to-noise regimes; the method is illustrated on a child growth study.
Significance. If the central claims hold, this would represent a useful methodological advance in functional data analysis by supplying a coherent Bayesian framework that propagates uncertainty from PC estimation into downstream inferences, an aspect often ignored in frequentist FPCA. The emphasis on computational scalability via parallelization and prediction routines, together with the explicit handling of multivariate sparse sampling, addresses practical needs in applications such as longitudinal biomedical studies.
major comments (3)
- [§3] §3 (Model extensions), paragraph on standardization: the assertion that standardizing each functional covariate 'sufficiently mitigates poor posterior conditioning due to disparate scales' is presented without a supporting sensitivity analysis or theoretical bound; when covariates retain residual scale differences or exhibit highly irregular sparse sampling, the joint posterior for the multivariate orthonormal spline coefficients may remain ill-conditioned, directly undermining the claim of stable and identifiable posterior PCs.
- [§4.2] §4.2 (Posterior alignment), Procrustes procedure: the alignment step is introduced to enforce rotational stability across posterior draws, yet no diagnostic is reported (e.g., pre- versus post-alignment trace plots of the leading eigenvalues or cross-covariate correlations) to verify that the procedure does not distort the joint posterior geometry; without such checks the uncertainty quantification central to the 'uniquely valid inferences' claim cannot be confirmed.
- [§5] §5 (Simulation study): the abstract states that MSFAST produces 'accurate estimates, particularly in smaller signal-to-noise regimes,' but the reported results lack quantitative metrics (bias, coverage, or RMSE with error bars) and explicit exclusion criteria for low-SNR settings; this absence prevents assessment of whether the claimed gains over frequentist FPCA are load-bearing or merely suggestive.
minor comments (2)
- [§2] The notation for the multivariate spline coefficients and the precise form of the parameter-expansion prior could be clarified with an explicit equation reference in the model section.
- [§5] Figure captions for the simulation results should include the exact SNR values and sample sizes used in each panel to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have helped us identify opportunities to strengthen the manuscript. We address each major comment below and describe the revisions we plan to make in response.
read point-by-point responses
-
Referee: [§3] §3 (Model extensions), paragraph on standardization: the assertion that standardizing each functional covariate 'sufficiently mitigates poor posterior conditioning due to disparate scales' is presented without a supporting sensitivity analysis or theoretical bound; when covariates retain residual scale differences or exhibit highly irregular sparse sampling, the joint posterior for the multivariate orthonormal spline coefficients may remain ill-conditioned, directly undermining the claim of stable and identifiable posterior PCs.
Authors: We appreciate the referee's observation. The per-covariate standardization is motivated by standard practices for handling scale differences in multivariate settings and is intended to improve posterior conditioning in the orthonormal spline coefficient model. We agree that the current presentation would be strengthened by empirical support. In the revised manuscript we will add a sensitivity analysis that examines the condition number of the posterior covariance matrix (or equivalent diagnostics) across a range of residual scale disparities and irregular sparse sampling patterns, both with and without standardization. revision: yes
-
Referee: [§4.2] §4.2 (Posterior alignment), Procrustes procedure: the alignment step is introduced to enforce rotational stability across posterior draws, yet no diagnostic is reported (e.g., pre- versus post-alignment trace plots of the leading eigenvalues or cross-covariate correlations) to verify that the procedure does not distort the joint posterior geometry; without such checks the uncertainty quantification central to the 'uniquely valid inferences' claim cannot be confirmed.
Authors: We thank the referee for highlighting the need for verification of the Procrustes alignment. The procedure is a standard orthogonal transformation used to resolve label switching due to rotational invariance in PCA. To confirm that it does not materially alter the joint posterior geometry, we will include diagnostic comparisons (pre- versus post-alignment trace plots of the leading eigenvalues and cross-covariate correlations) in the revised manuscript or supplementary material. revision: yes
-
Referee: [§5] §5 (Simulation study): the abstract states that MSFAST produces 'accurate estimates, particularly in smaller signal-to-noise regimes,' but the reported results lack quantitative metrics (bias, coverage, or RMSE with error bars) and explicit exclusion criteria for low-SNR settings; this absence prevents assessment of whether the claimed gains over frequentist FPCA are load-bearing or merely suggestive.
Authors: We acknowledge that while the simulation figures illustrate performance differences, explicit numerical summaries would facilitate direct assessment. In the revision we will add a table reporting bias, RMSE, and coverage probabilities (with standard errors) for the principal component estimates and downstream quantities, and we will explicitly state the SNR thresholds and any exclusion rules used to define the low signal-to-noise regimes. revision: yes
Circularity Check
No significant circularity; MSFAST extends prior FAST with independent modeling and alignment steps evaluated via simulation
full rationale
The paper proposes MSFAST as a new fully Bayesian framework that explicitly adds six extensions to the earlier FAST method, including per-covariate standardization, an improved orthogonal spline basis, Procrustes posterior alignment, and efficient prediction routines. These are presented as new parameterizations and procedures rather than derivations that reduce to the inputs by construction. Central claims of uniquely valid inferences and improved accuracy in low-SNR regimes are assessed through simulation comparisons, not through self-referential definitions or load-bearing self-citations that would force the result. The derivation chain therefore introduces independent content and remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Orthonormal splines can represent the principal components with parameter expansion allowing sampling of coefficients.
- ad hoc to paper Standardization of each functional covariate mitigates poor posterior conditioning from disparate scales.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MSFAST extends FAST to multivariate, sparsely-observed data by (1) standardizing each functional covariate... (5) using a Procrustes-based posterior PC alignment procedure
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We represent PCs using orthonormal splines and sample the orthonormal spline coefficients using parameter expansion
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Sufficient conditions for proper posteriors in fully-Bayesian Functional PCA
No additional conditions beyond the spline projection and mixed-effects equivalence are needed for the smoothing prior and posterior to be proper in fully-Bayesian FPCA.
Reference graph
Works this paper leans on
-
[1]
A Conceptual Introduction to Hamiltonian Monte Carlo
M. Betancourt. A Conceptual Introduction to Hamiltonian Monte Carlo , July 2018. URL http://arxiv.org/abs/1701.02434. arXiv:1701.02434 [stat]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A Probabilistic Programming Language . Journal of Statistical Software, 76: 0 1--32, Jan. 2017. ISSN 1548-7660. doi:10.18637/jss.v076.i01. URL https://doi.org/10.18637/jss.v076.i01
-
[3]
W. Checkley, L. D. Epstein, R. H. Gilman, R. E. Black, L. Cabrera, and C. R. Sterling. Effects of Cryptosporidium parvum infection in Peruvian children: growth faltering and subsequent catch-up growth. American Journal of Epidemiology, 148 0 (5): 0 497--506, Sept. 1998. ISSN 0002-9262. doi:10.1093/oxfordjournals.aje.a009675
-
[4]
W. Checkley, L. D. Epstein, R. H. Gilman, L. Cabrera, and R. E. Black. Effects of acute diarrhea on linear growth in Peruvian children. American Journal of Epidemiology, 157 0 (2): 0 166--175, Jan. 2003. ISSN 0002-9262. doi:10.1093/aje/kwf179
-
[5]
Y. Chikuse. Statistics on Special Manifolds , volume 174 of Lecture Notes in Statistics . Springer, New York, NY, 2003. ISBN 978-0-387-00160-9 978-0-387-21540-2. doi:10.1007/978-0-387-21540-2. URL http://link.springer.com/10.1007/978-0-387-21540-2. Edited by Bickel, P. and Diggle, P. and Fienberg, S. and Krickeberg, K. and Olkin, I. and Wermuth, N. and Zeger, S
-
[6]
C. M. Crainiceanu and A. J. Goldsmith. Bayesian Functional Data Analysis Using WinBUGS . Journal of Statistical Software, 32: 0 1--33, Jan. 2010. ISSN 1548-7660. doi:10.18637/jss.v032.i11. URL https://doi.org/10.18637/jss.v032.i11
-
[7]
C. M. Crainiceanu, J. Goldsmith, A. Leroux, and E. Cui. Functional Data Analysis with R . Chapman and Hall/CRC, 2024 a
work page 2024
-
[8]
C. M. Crainiceanu, J. Goldsmith, A. Leroux, and E. Cui. Functional Data Analysis with R , chapter 3. Chapman and Hall/CRC, 2024 b
work page 2024
-
[9]
P. Craven and G. Wahba. Smoothing noisy data with spline functions. Numerische Mathematik, 1: 0 377--403, 1979
work page 1979
-
[10]
A. Gelman and D. B. Rubin. Inference from Iterative Simulation Using Multiple Sequences . Statistical Science, 7 0 (4): 0 457--472, Nov. 1992. ISSN 0883-4237, 2168-8745. doi:10.1214/ss/1177011136. URL https://projecteuclid.org/journals/statistical-science/volume-7/issue-4/Inference-from-Iterative-Simulation-Using-Multiple-Sequences/10.1214/ss/1177011136.f...
-
[11]
J. Gertheiss, J. Goldsmith, and A.-M. Staicu. A note on modeling sparse exponential-family functional response curves. Computational Statistics & Data Analysis, 105: 0 46--52, Jan. 2017. ISSN 0167-9473. doi:10.1016/j.csda.2016.07.010. URL https://www.sciencedirect.com/science/article/pii/S0167947316301748
-
[12]
J. Goldsmith, F. Scheipl, L. Huang, J. Wrobel, C. Di, J. Gellar, J. Harezlak, M. W. McLean, B. Swihart, L. Xiao, C. Crainiceanu, P. T. Reiss, and E. Cui. refund: Regression with Functional Data , Aug. 2010. URL https://CRAN.R-project.org/package=refund. Institution: Comprehensive R Archive Network
work page 2010
-
[13]
J. Goldsmith, S. Greven, and C. Crainiceanu. Corrected confidence bands for functional data using principal components. Biometrics, 69 0 (1): 0 41--51, 2013
work page 2013
-
[14]
J. Goldsmith, V. Zipunnikov, and J. Schrack. Generalized Multilevel Function -on- Scalar Regression and Principal Component Analysis . Biometrics, 71 0 (2): 0 344--353, June 2015. ISSN 0006-341X. doi:10.1111/biom.12278. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4479975/
-
[15]
S. Golovkine, E. Gunning, A. J. Simpkin, and N. Bargary. On the estimation of the number of components in multivariate functional principal component analysis. Communications in Statistics - Simulation and Computation, 0 0 (0): 0 1--9, Feb. 2025. ISSN 0361-0918. doi:10.1080/03610918.2025.2459862. URL https://doi.org/10.1080/03610918.2025.2459862. Publishe...
-
[16]
G. H. Golub and C. F. V. Loan. Matrix computations (2nd edition), volume 74. Johns Hopkins Press, 1989. ISBN 0-8018-3772-3. doi:10.2307/3619868
-
[17]
C. Happ and S. Greven. Multivariate Functional Principal Component Analysis for Data Observed on Different ( Dimensional ) Domains . Journal of the American Statistical Association, 113 0 (522): 0 649--659, Apr. 2018. ISSN 0162-1459. doi:10.1080/01621459.2016.1273115. URL https://doi.org/10.1080/01621459.2016.1273115. Publisher: ASA Website \_eprint: http...
-
[18]
N. J. Higham and R. S. Schreiber. Fast polar decomposition of an arbitrary matrix. SIAM Journal on Scientific and Statistical Computing, 11 0 (4): 0 648--655, 1990. doi:10.1137/0911038. URL https://doi.org/10.1137/0911038
-
[19]
A. E. Ivanescu, C. M. Crainiceanu, and W. Checkley. Dynamic child growth prediction: A comparative methods approach. Statistical Modelling, 17 0 (6): 0 468--493, Dec. 2017. ISSN 1471-082X. doi:10.1177/1471082X17707619. URL https://doi.org/10.1177/1471082X17707619. Publisher: SAGE Publications India
-
[20]
A. E. Ivanescu, W. Checkley, and C. Crainiceanu. Outlier detection in dynamic functional models. 2024
work page 2024
-
[21]
D. Jaganath, M. Saito, R. H. Gilman, D. M. M. Queiroz, G. A. Rocha, V. Cama, L. Cabrera, D. Kelleher, H. J. Windle, J. E. Crabtree, and W. Checkley. First detected Helicobacter pylori infection in infancy modifies the association between diarrheal disease and childhood growth in Peru . Helicobacter, 19 0 (4): 0 272--279, Aug. 2014. ISSN 1523-5378. doi:10....
-
[22]
M. Jauch, P. D. Hoff, and D. B. Dunson. Monte Carlo Simulation on the Stiefel Manifold via Polar Expansion . Journal of Computational and Graphical Statistics, 30 0 (3): 0 622--631, Sept. 2021. ISSN 1061-8600. doi:10.1080/10618600.2020.1859382. URL https://doi.org/10.1080/10618600.2020.1859382. Publisher: Taylor & Francis \_eprint: https://doi.org/10.1080...
- [23]
-
[24]
G. Kimeldorf and G. Wahba. A correspondence between bayesian estimation on stochastic processes and smoothing by splines. The Annals of Mathematical Statistics, 41 0 (2): 0 495--502, 1970
work page 1970
-
[25]
D. Kosambi. Statistics in function space. Journal of the Indian Mathematical Society, 7: 0 77--88, 1943
work page 1943
-
[26]
P. C. Lambert, A. J. Sutton, P. R. Burton, K. R. Abrams, and D. R. Jones. How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS . Statistics in Medicine, 24 0 (15): 0 2401--2428, Aug. 2005. ISSN 0277-6715. doi:10.1002/sim.2112
-
[27]
C. Li, L. Xiao, and S. Luo. Fast covariance estimation for multivariate sparse functional data. Stat (International Statistical Institute), 9 0 (1): 0 e245, Dec. 2020. ISSN 2049-1573. doi:10.1002/sta4.245. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276768/
- [28]
-
[29]
M. Loève. Probability Theory , volume II of Graduate Texts in Mathematics . Springer-Verlag, 4th edition, 1978
work page 1978
- [30]
-
[31]
T. H. Nolan, J. Goldsmith, and D. Ruppert. Bayesian Functional Principal Components Analysis via Variational Message Passing with Multilevel Extensions . Bayesian Analysis, -1 0 (-1): 0 1--27, Jan. 2023. ISSN 1936-0975, 1931-6690. doi:10.1214/23-BA1393. URL https://projecteuclid.org/journals/bayesian-analysis/advance-publication/Bayesian-Functional-Princi...
-
[32]
T. H. Nolan, S. Richardson, and H. Ruffieux. Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves. Computational Statistics & Data Analysis, 203: 0 108094, Mar. 2025. ISSN 0167-9473. doi:10.1016/j.csda.2024.108094. URL https://www.sciencedirect.com/science/article/pii/S0167947324001786
-
[33]
F. O’Sullivan. A statistical perspective on ill-posed inverse problems (with discussion). Statistical Science, 1 0 (4): 0 505–527, 1986
work page 1986
-
[34]
O. Papaspiliopoulos, G. O. Roberts, and M. Sköld. A General Framework for the Parametrization of Hierarchical Models . Statistical Science, 22 0 (1): 0 59--73, Feb. 2007. doi:https://doi.org/10.1214/088342307000000014. URL https://projecteuclid.org/journals/statistical-science/volume-22/issue-1/A-General-Framework-for-the-Parametrization-of-Hierarchical-M...
-
[35]
J. Ramsay and B. Silverman. Functional Data Analysis . Springer New York, NY, USA, 2005
work page 2005
-
[36]
A. Redd. A comment on the orthogonalization of B -spline basis functions and their derivatives. Statistics and Computing, 22 0 (1): 0 251--257, Jan. 2012. ISSN 1573-1375. doi:10.1007/s11222-010-9221-0. URL https://doi.org/10.1007/s11222-010-9221-0
-
[37]
D. Ruppert. Selecting the Number of Knots for Penalized Splines . Journal of Computational and Graphical Statistics, 11 0 (4): 0 735--757, Dec. 2002. ISSN 1061-8600. doi:10.1198/106186002853. URL https://doi.org/10.1198/106186002853
-
[38]
J. Sartini, X. Zhou, L. Selvin, S. Zeger, and C. Crainiceanu. Fast Bayesian Functional Principal Components Analysis . Dec. 2024. doi:10.48550/arXiv.2412.11340. URL http://arxiv.org/abs/2412.11340. arXiv:2412.11340 [stat]
-
[39]
J. Sharpe and N. Fieller. Uncertainty in functional principal component analysis. Journal of Applied Statistics, 43 0 (12): 0 2295--2309, Sept. 2016. ISSN 0266-4763. doi:10.1080/02664763.2016.1140728. URL https://doi.org/10.1080/02664763.2016.1140728. Publisher: Taylor & Francis \_eprint: https://doi.org/10.1080/02664763.2016.1140728
-
[40]
J. Staniswalis and J. Lee. Nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association, 93 0 (444): 0 1403--1418, 1998
work page 1998
-
[41]
S. D. Team. Stan modeling language users guide and reference manual 2.36, 2025. URL https://mc-stan.org/docs/reference-manual/transforms.html#ordered-vector
work page 2025
-
[42]
N. Trendafilov and M. Gallo. Procrustes analysis (PA), pages 187--228. Springer International Publishing, Cham, 2021. ISBN 978-3-030-76974-1. doi:10.1007/978-3-030-76974-1_6. URL https://doi.org/10.1007/978-3-030-76974-1_6
-
[43]
G. Wahba. Bayesian “Confidence Intervals” for the Cross-Validated Smoothing Spline . Journal of the Royal Statistical Society: Series B, 45 0 (1): 0 133--150, 1983
work page 1983
-
[44]
G. A. Watson. The solution of orthogonal Procrustes problems for a family of orthogonally invariant norms. Advances in Computational Mathematics, 2 0 (4): 0 393--405, Sept. 1994. ISSN 1572-9044. doi:10.1007/BF02521606. URL https://doi.org/10.1007/BF02521606
-
[45]
L. Xiao, C. Li, W. Checkley, and C. Crainiceanu. Fast covariance estimation for sparse functional data. Statistics and Computing, 28 0 (3): 0 511--522, May 2018. ISSN 1573-1375. doi:10.1007/s11222-017-9744-8. URL https://doi.org/10.1007/s11222-017-9744-8
-
[46]
F. Yao, H.-G. Müller, and J.-L. Wang. Functional Data Analysis for Sparse Longitudinal Data . Journal of the American Statistical Association, 100 0 (470): 0 577--590, June 2005. ISSN 0162-1459. doi:10.1198/016214504000001745. URL https://doi.org/10.1198/016214504000001745. Publisher: ASA Website \_eprint: https://doi.org/10.1198/016214504000001745
-
[47]
J. Ye. Functional principal component models for sparse and irregularly spaced data by Bayesian inference. Journal of Applied Statistics, 51 0 (7): 0 1287--1317, May 2024. ISSN 0266-4763. doi:10.1080/02664763.2023.2197587. URL https://doi.org/10.1080/02664763.2023.2197587. Publisher: Taylor & Francis \_eprint: https://doi.org/10.1080/02664763.2023.2197587
- [48]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.