Consistency of the Bayesian Information Criterion for Model Selection in Exploratory Factor Analysis
Pith reviewed 2026-05-10 17:47 UTC · model grok-4.3
The pith
The Bayesian information criterion is strongly consistent for selecting the pseudo-true factor order in exploratory factor analysis even under misspecification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The BIC is strongly consistent for the pseudo-true factor order under misspecification provided all globally optimal models share a common pseudo-true covariance set, the population Gaussian criterion has a local quadratic margin away from that set, and the BIC complexity counts are order-separating at the pseudo-true order. The selection target is the smallest candidate factor order that yields the best Gaussian approximation, in Kullback-Leibler divergence, to the data-generating covariance structure. The candidate models may have an unknown mean vector, exact-zero restrictions in the loading matrix, and either diagonal or spherical error covariance structures. Under correct specification,
What carries the argument
The BIC penalty applied to maximized Gaussian log-likelihood over compact covariance classes, with the consistency argument carried out directly in covariance space to accommodate singularities from rotations and redundant factors.
If this is right
- The BIC selects the minimal factor order achieving the smallest population Kullback-Leibler divergence to the true covariance.
- The consistency result covers models with unknown means, exact-zero loading restrictions, and both diagonal and spherical error structures.
- Under correct specification the assumptions reduce to standard regularity properties of the true covariance matrix.
- The same consistency proof applies to other information criteria whose penalties satisfy equivalent gap conditions.
Where Pith is reading between the lines
- Practitioners using factor analysis on approximately structured data can apply BIC with greater assurance that the selected order corresponds to the best available approximation rather than an artifact of the penalty.
- The direct covariance-space technique may extend to consistency proofs for information criteria in other singular latent-variable settings such as reduced-rank regression or certain mixture models.
- Empirical verification of the quadratic margin condition on fitted covariances from real datasets would indicate the range of practical problems where the consistency guarantee applies.
Load-bearing premise
All globally optimal models share a common pseudo-true covariance set around which the population Gaussian criterion has a local quadratic margin.
What would settle it
A simulation study generating data from a covariance where the local quadratic margin condition fails around the optimal set, then checking whether BIC selects the wrong factor order with positive probability.
read the original abstract
We study model selection by the Bayesian information criterion (BIC) in fixed-dimensional exploratory factor analysis over a fixed finite family of compact covariance classes. Our main result shows that the BIC is strongly consistent for the pseudo-true factor order under misspecification, provided that all globally optimal models share a common pseudo-true covariance set, the population Gaussian criterion has a local quadratic margin away from that set, and the BIC complexity counts are order-separating at the pseudo-true order. The candidate models may have an unknown mean vector, exact-zero restrictions in the loading matrix, and either diagonal or spherical error covariance structures, and the selection target is the smallest candidate factor order that yields the best Gaussian approximation, in Kullback--Leibler divergence, to the data-generating covariance structure. The proof works directly in covariance space, so it does not require a regular loading parametrization and accommodates the familiar singularities caused by rotations and redundant factors. Under correct specification, the assumptions reduce to familiar properties of the true covariance matrix. More generally, the same argument applies to other information criteria whose penalties satisfy the same gap conditions, including several BIC-type modifications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proves strong consistency of BIC for selecting the pseudo-true factor order (the smallest order achieving minimal KL Gaussian approximation to the true covariance) in fixed-dimensional exploratory factor analysis over a finite family of compact covariance classes, under misspecification. The main result holds provided three conditions: all globally optimal models share a common pseudo-true covariance set, the population Gaussian criterion has a local quadratic margin away from that set, and BIC complexity counts are order-separating at the pseudo-true order. The proof works directly in covariance space (avoiding regular loading parametrizations) to accommodate rotational singularities and redundant factors. The result reduces to standard properties under correct specification and extends to other information criteria with analogous penalties. Candidate models allow unknown means, exact-zero loading restrictions, and diagonal or spherical errors.
Significance. If the result holds, it supplies a rigorous justification for BIC-based selection of factor order in EFA even when the Gaussian factor model is misspecified, which is the typical practical case. The covariance-space approach is a clear technical strength because it sidesteps parametrization singularities that plague loading-matrix arguments. The explicit reduction to familiar conditions under correct specification and the generalization to other penalties are useful. This contributes to the literature on information-criterion consistency for singular models.
minor comments (1)
- [Abstract] The abstract is concise and lists the three conditions clearly; a single sentence noting that the same argument covers other BIC-type penalties could be added for immediate visibility.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of the manuscript and for recommending acceptance. The referee's summary accurately captures the main result, assumptions, and technical approach.
Circularity Check
No significant circularity
full rationale
The paper establishes a conditional strong-consistency theorem for BIC selection of the pseudo-true factor order in EFA under misspecification. The derivation proceeds from three explicitly listed, independent assumptions (common pseudo-true covariance set across globally optimal models, local quadratic margin of the population Gaussian criterion, and order-separating BIC complexity counts) to the consistency conclusion. The proof strategy works directly in covariance space without requiring regular parametrizations. No step reduces by construction to a fitted quantity, self-definition, or self-citation chain; the assumptions are standard and falsifiable outside the target result. Under correct specification the assumptions simplify to familiar properties of the true covariance, but this is a reduction of the hypothesis class rather than circularity. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption All globally optimal models share a common pseudo-true covariance set
- domain assumption The population Gaussian criterion has a local quadratic margin away from that set
- domain assumption The BIC complexity counts are order-separating at the pseudo-true order
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our main result shows that the BIC is strongly consistent for the pseudo-true factor order under misspecification, provided that all globally optimal models share a common pseudo-true covariance set, the population Gaussian criterion has a local quadratic margin away from that set, and the BIC complexity counts are order-separating at the pseudo-true order.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The proof works directly in covariance space, so it does not require a regular loading parametrization and accommodates the familiar singularities caused by rotations and redundant factors.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ahn, S. C. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors.Econometrica81, 1203–1227. 1, 4.3
work page 2013
-
[2]
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. InProceedings of the Second International Symposium on Information Theory, 267–281. 1, 4.5
work page 1973
-
[3]
Akaike, H. (1987). Factor analysis and AIC.Psychometrika52, 317–332. 1
work page 1987
-
[4]
Alessi, L., Barigozzi, M., and Capasso, M. (2010). Improved penalization for determining the number of factors in approximate factor models.Statistics and Probability Letters80, 1806–1813. 1, 4.3
work page 2010
-
[5]
Amengual, D. and Watson, M. W. (2007). Consistent estimation of the number of dynamic factors in a largeNandTpanel.Journal of Business and Economic Statistics25, 91–96. 1, 4.3
work page 2007
- [6]
-
[7]
Baudry, J.-P. (2015). Estimation and model selection for model-based clustering with the conditional classification likelihood.Electronic Journal of Statistics9, 1041–1077. 4.4
work page 2015
-
[8]
(1994).Statistical Factor Analysis and Related Methods: Theory and Applications
Basilevsky, A. (1994).Statistical Factor Analysis and Related Methods: Theory and Applications. Wiley, New York. 1
work page 1994
-
[9]
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions.Psychometrika52, 345–370. 3.3
work page 1987
-
[10]
Chen, Y.-P., Huang, H.-C., and Tu, I.-P. (2010). A new approach for selecting the number of factors. Computational Statistics and Data Analysis54, 2990–2998. 1
work page 2010
-
[11]
Chen, Y., Moustaki, I., and Zhang, H. (2020). A note on likelihood ratio tests for models with latent variables.Psychometrika85, 996–1012. 1, 4.1
work page 2020
-
[12]
Choi, I. and Jeong, H. (2019). Model selection for factor analysis: Some new criteria and performance comparisons.Econometric Reviews38, 577–596. 1, 4.3, 4.5
work page 2019
-
[13]
Drton, M. and Plummer, M. (2017). A Bayesian information criterion for singular models.Journal of the Royal Statistical Society, Series B79, 323–380. 1, 4.7
work page 2017
-
[14]
Gassiat, E. and van Handel, R. (2013). Consistent order estimation and minimal penalties.IEEE Transac- tions on Information Theory59, 1115–1140. 1 20
work page 2013
-
[15]
Hallin, M. and Liška, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association102, 603–617. 1, 4.3
work page 2007
-
[16]
Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression.Journal of the Royal Statistical Society, Series B41, 190–195. 3.3
work page 1979
-
[17]
Haughton, D. M. A. (1988). On the choice of a model to fit data from an exponential family.The Annals of Statistics16, 342–355. 3.3
work page 1988
-
[18]
Hirose, K., Kawano, S., Konishi, S., and Ichikawa, M. (2011). Bayesian information criterion and selection of the number of factors in factor analysis models.Journal of Data Science9, 243–259. 1
work page 2011
-
[19]
Hirose, K. and Imada, M. (2018). Sparse factor regression via penalized maximum likelihood estimation. Statistical Papers59, 633–662. 4.6
work page 2018
-
[20]
Hirose, K. and Terada, Y. (2023). Sparse and simple structure estimation via prenet penalization.Psy- chometrika88, 1381–1406. 4.6
work page 2023
-
[21]
Huang, P.-H. (2017). Asymptotics of AIC, BIC, and RMSEA for model selection in structural equation modeling.Psychometrika82, 407–426. 1, 3.3, 4.3, 4.5
work page 2017
-
[22]
Keribin, C. (2000). Consistent estimation of the order of mixture models.Sankhy¯ a, Series A62, 49–66. 1, 3, 3.3
work page 2000
- [23]
-
[24]
Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis.Statistica Sinica,14, 41–67. 1
work page 2004
-
[25]
Morimoto, T., Hung, H., and Huang, S.-Y. (2026). A unified selection consistency theorem for information criterion-based rank estimators in factor analysis.Journal of Multivariate Analysis211, 105498. 1, 4.3
work page 2026
-
[26]
Nishii, R. (1988). Maximum likelihood principle and model selection when the true model is unspecified. Journal of Multivariate Analysis27, 392–403. 1, 4.1
work page 1988
-
[27]
Nguyen, H. D. (2024). PanIC: Consistent information criteria for general model selection problems. Australian & New Zealand Journal of Statistics,66, 441–466. 1, 4.1, 4.4
work page 2024
-
[28]
Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues.Review of Economics and Statistics92, 1004–1016. 1, 4.3
work page 2010
-
[29]
Pearson, R., Mundfrom, D., and Piccone, A. (2013). A comparison of ten methods for determining the number of factors in exploratory factor analysis.Multiple Linear Regression Viewpoints39, 1–15. 1
work page 2013
-
[30]
J., Zhang, G., Kim, C., and Mels, G
Preacher, K. J., Zhang, G., Kim, C., and Mels, G. (2013). Choosing the optimal number of factors in exploratory factor analysis: A model selection perspective.Multivariate Behavioral Research48, 28–56. 1
work page 2013
-
[31]
Schwarz, G. E. (1978). Estimating the dimension of a model.The Annals of Statistics6, 461–464. 1
work page 1978
-
[32]
Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika52, 333–343. 3.3
work page 1987
-
[33]
Sin, C.-Y. and White, H. (1996). Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics71, 207–225. 1, 4.1, 4.4
work page 1996
-
[34]
Song, J. and Belin, T. R. (2008). Choosing an appropriate number of factors in factor analysis with incomplete data.Computational Statistics and Data Analysis52, 3560–3569. 1 van Handel, R. (2011). On the minimal penalty for Markov order estimation.Probability Theory and Related Fields150, 709–738. 1 van de Geer, S. (2000).Applications of Empirical Proces...
work page 2008
-
[35]
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses.Econometrica 57, 307–333. 1, 4.1
work page 1989
-
[36]
(2009).Algebraic Geometry and Statistical Learning Theory
Watanabe, S. (2009).Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, Cambridge. 1
work page 2009
-
[37]
Watanabe, S. (2013). A widely applicable Bayesian information criterion.Journal of Machine Learning Research14, 867–897. 1, 4.7 21
work page 2013
-
[38]
Westerhout, J., Nguyen, T., Guo, X., and Nguyen, H. D. (2024). On the asymptotic distribution of the minimum empirical risk. InProceedings of the 41st International Conference on Machine Learning. 1, 4.4
work page 2024
-
[39]
White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica50, 1–25. 1
work page 1982
-
[40]
(1994).Estimation, Inference and Specification Analysis
White, H. (1994).Estimation, Inference and Specification Analysis. Cambridge University Press, Cambridge. 1 22
work page 1994
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.