pith. sign in

arxiv: 2604.07998 · v1 · submitted 2026-04-09 · 🧮 math.ST · stat.TH

Consistency of the Bayesian Information Criterion for Model Selection in Exploratory Factor Analysis

Pith reviewed 2026-05-10 17:47 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords Bayesian information criterionexploratory factor analysismodel selectionconsistencymisspecificationpseudo-true ordercovariance approximationKullback-Leibler divergence
0
0 comments X

The pith

The Bayesian information criterion is strongly consistent for selecting the pseudo-true factor order in exploratory factor analysis even under misspecification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that the Bayesian information criterion remains strongly consistent when selecting the number of factors in exploratory factor analysis, even if the true data-generating process lies outside all candidate models. It targets the smallest factor order that gives the closest Gaussian approximation to the observed covariance structure in terms of Kullback-Leibler divergence. This matters because real-world data rarely fits exact factor models perfectly, yet researchers still need reliable ways to choose model complexity without the selection being thrown off by misspecification. The argument works by analyzing the criterion directly on covariance matrices rather than through loading parameters, which sidesteps common identifiability problems like factor rotations.

Core claim

The BIC is strongly consistent for the pseudo-true factor order under misspecification provided all globally optimal models share a common pseudo-true covariance set, the population Gaussian criterion has a local quadratic margin away from that set, and the BIC complexity counts are order-separating at the pseudo-true order. The selection target is the smallest candidate factor order that yields the best Gaussian approximation, in Kullback-Leibler divergence, to the data-generating covariance structure. The candidate models may have an unknown mean vector, exact-zero restrictions in the loading matrix, and either diagonal or spherical error covariance structures. Under correct specification,

What carries the argument

The BIC penalty applied to maximized Gaussian log-likelihood over compact covariance classes, with the consistency argument carried out directly in covariance space to accommodate singularities from rotations and redundant factors.

If this is right

  • The BIC selects the minimal factor order achieving the smallest population Kullback-Leibler divergence to the true covariance.
  • The consistency result covers models with unknown means, exact-zero loading restrictions, and both diagonal and spherical error structures.
  • Under correct specification the assumptions reduce to standard regularity properties of the true covariance matrix.
  • The same consistency proof applies to other information criteria whose penalties satisfy equivalent gap conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners using factor analysis on approximately structured data can apply BIC with greater assurance that the selected order corresponds to the best available approximation rather than an artifact of the penalty.
  • The direct covariance-space technique may extend to consistency proofs for information criteria in other singular latent-variable settings such as reduced-rank regression or certain mixture models.
  • Empirical verification of the quadratic margin condition on fitted covariances from real datasets would indicate the range of practical problems where the consistency guarantee applies.

Load-bearing premise

All globally optimal models share a common pseudo-true covariance set around which the population Gaussian criterion has a local quadratic margin.

What would settle it

A simulation study generating data from a covariance where the local quadratic margin condition fails around the optimal set, then checking whether BIC selects the wrong factor order with positive probability.

read the original abstract

We study model selection by the Bayesian information criterion (BIC) in fixed-dimensional exploratory factor analysis over a fixed finite family of compact covariance classes. Our main result shows that the BIC is strongly consistent for the pseudo-true factor order under misspecification, provided that all globally optimal models share a common pseudo-true covariance set, the population Gaussian criterion has a local quadratic margin away from that set, and the BIC complexity counts are order-separating at the pseudo-true order. The candidate models may have an unknown mean vector, exact-zero restrictions in the loading matrix, and either diagonal or spherical error covariance structures, and the selection target is the smallest candidate factor order that yields the best Gaussian approximation, in Kullback--Leibler divergence, to the data-generating covariance structure. The proof works directly in covariance space, so it does not require a regular loading parametrization and accommodates the familiar singularities caused by rotations and redundant factors. Under correct specification, the assumptions reduce to familiar properties of the true covariance matrix. More generally, the same argument applies to other information criteria whose penalties satisfy the same gap conditions, including several BIC-type modifications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper proves strong consistency of BIC for selecting the pseudo-true factor order (the smallest order achieving minimal KL Gaussian approximation to the true covariance) in fixed-dimensional exploratory factor analysis over a finite family of compact covariance classes, under misspecification. The main result holds provided three conditions: all globally optimal models share a common pseudo-true covariance set, the population Gaussian criterion has a local quadratic margin away from that set, and BIC complexity counts are order-separating at the pseudo-true order. The proof works directly in covariance space (avoiding regular loading parametrizations) to accommodate rotational singularities and redundant factors. The result reduces to standard properties under correct specification and extends to other information criteria with analogous penalties. Candidate models allow unknown means, exact-zero loading restrictions, and diagonal or spherical errors.

Significance. If the result holds, it supplies a rigorous justification for BIC-based selection of factor order in EFA even when the Gaussian factor model is misspecified, which is the typical practical case. The covariance-space approach is a clear technical strength because it sidesteps parametrization singularities that plague loading-matrix arguments. The explicit reduction to familiar conditions under correct specification and the generalization to other penalties are useful. This contributes to the literature on information-criterion consistency for singular models.

minor comments (1)
  1. [Abstract] The abstract is concise and lists the three conditions clearly; a single sentence noting that the same argument covers other BIC-type penalties could be added for immediate visibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of the manuscript and for recommending acceptance. The referee's summary accurately captures the main result, assumptions, and technical approach.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper establishes a conditional strong-consistency theorem for BIC selection of the pseudo-true factor order in EFA under misspecification. The derivation proceeds from three explicitly listed, independent assumptions (common pseudo-true covariance set across globally optimal models, local quadratic margin of the population Gaussian criterion, and order-separating BIC complexity counts) to the consistency conclusion. The proof strategy works directly in covariance space without requiring regular parametrizations. No step reduces by construction to a fitted quantity, self-definition, or self-citation chain; the assumptions are standard and falsifiable outside the target result. Under correct specification the assumptions simplify to familiar properties of the true covariance, but this is a reduction of the hypothesis class rather than circularity. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claim rests on three explicit domain assumptions about model optimality, margin behavior, and penalty separation; no free parameters or invented entities are introduced.

axioms (3)
  • domain assumption All globally optimal models share a common pseudo-true covariance set
    Required for the BIC to select the pseudo-true order under misspecification.
  • domain assumption The population Gaussian criterion has a local quadratic margin away from that set
    Ensures the criterion behaves sufficiently well near the optimal set for consistency.
  • domain assumption The BIC complexity counts are order-separating at the pseudo-true order
    Guarantees the penalty term distinguishes different factor orders asymptotically.

pith-pipeline@v0.9.0 · 5494 in / 1370 out tokens · 37911 ms · 2026-05-10T17:47:48.395043+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Our main result shows that the BIC is strongly consistent for the pseudo-true factor order under misspecification, provided that all globally optimal models share a common pseudo-true covariance set, the population Gaussian criterion has a local quadratic margin away from that set, and the BIC complexity counts are order-separating at the pseudo-true order.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The proof works directly in covariance space, so it does not require a regular loading parametrization and accommodates the familiar singularities caused by rotations and redundant factors.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Ahn, S. C. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors.Econometrica81, 1203–1227. 1, 4.3

  2. [2]

    Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. InProceedings of the Second International Symposium on Information Theory, 267–281. 1, 4.5

  3. [3]

    Akaike, H. (1987). Factor analysis and AIC.Psychometrika52, 317–332. 1

  4. [4]

    Alessi, L., Barigozzi, M., and Capasso, M. (2010). Improved penalization for determining the number of factors in approximate factor models.Statistics and Probability Letters80, 1806–1813. 1, 4.3

  5. [5]

    and Watson, M

    Amengual, D. and Watson, M. W. (2007). Consistent estimation of the number of dynamic factors in a largeNandTpanel.Journal of Business and Economic Statistics25, 91–96. 1, 4.3

  6. [6]

    and Ng, S

    Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.Econometrica 70, 191–221. 1, 4.3

  7. [7]

    Baudry, J.-P. (2015). Estimation and model selection for model-based clustering with the conditional classification likelihood.Electronic Journal of Statistics9, 1041–1077. 4.4

  8. [8]

    (1994).Statistical Factor Analysis and Related Methods: Theory and Applications

    Basilevsky, A. (1994).Statistical Factor Analysis and Related Methods: Theory and Applications. Wiley, New York. 1

  9. [9]

    Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions.Psychometrika52, 345–370. 3.3

  10. [10]

    Chen, Y.-P., Huang, H.-C., and Tu, I.-P. (2010). A new approach for selecting the number of factors. Computational Statistics and Data Analysis54, 2990–2998. 1

  11. [11]

    Chen, Y., Moustaki, I., and Zhang, H. (2020). A note on likelihood ratio tests for models with latent variables.Psychometrika85, 996–1012. 1, 4.1

  12. [12]

    and Jeong, H

    Choi, I. and Jeong, H. (2019). Model selection for factor analysis: Some new criteria and performance comparisons.Econometric Reviews38, 577–596. 1, 4.3, 4.5

  13. [13]

    and Plummer, M

    Drton, M. and Plummer, M. (2017). A Bayesian information criterion for singular models.Journal of the Royal Statistical Society, Series B79, 323–380. 1, 4.7

  14. [14]

    and van Handel, R

    Gassiat, E. and van Handel, R. (2013). Consistent order estimation and minimal penalties.IEEE Transac- tions on Information Theory59, 1115–1140. 1 20

  15. [15]

    and Liška, R

    Hallin, M. and Liška, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association102, 603–617. 1, 4.3

  16. [16]

    Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression.Journal of the Royal Statistical Society, Series B41, 190–195. 3.3

  17. [17]

    Haughton, D. M. A. (1988). On the choice of a model to fit data from an exponential family.The Annals of Statistics16, 342–355. 3.3

  18. [18]

    Hirose, K., Kawano, S., Konishi, S., and Ichikawa, M. (2011). Bayesian information criterion and selection of the number of factors in factor analysis models.Journal of Data Science9, 243–259. 1

  19. [19]

    and Imada, M

    Hirose, K. and Imada, M. (2018). Sparse factor regression via penalized maximum likelihood estimation. Statistical Papers59, 633–662. 4.6

  20. [20]

    and Terada, Y

    Hirose, K. and Terada, Y. (2023). Sparse and simple structure estimation via prenet penalization.Psy- chometrika88, 1381–1406. 4.6

  21. [21]

    Huang, P.-H. (2017). Asymptotics of AIC, BIC, and RMSEA for model selection in structural equation modeling.Psychometrika82, 407–426. 1, 3.3, 4.3, 4.5

  22. [22]

    Keribin, C. (2000). Consistent estimation of the order of mixture models.Sankhy¯ a, Series A62, 49–66. 1, 3, 3.3

  23. [23]

    Kosta, D., Windisch, D., Gross, E., Drton, M., Leykin, A., and Sullivant, S. (2025). Singular learning theory for factor analysis.arXiv preprintarXiv:2511.15419. 1, 4.7

  24. [24]

    Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis.Statistica Sinica,14, 41–67. 1

  25. [25]

    Morimoto, T., Hung, H., and Huang, S.-Y. (2026). A unified selection consistency theorem for information criterion-based rank estimators in factor analysis.Journal of Multivariate Analysis211, 105498. 1, 4.3

  26. [26]

    Nishii, R. (1988). Maximum likelihood principle and model selection when the true model is unspecified. Journal of Multivariate Analysis27, 392–403. 1, 4.1

  27. [27]

    Nguyen, H. D. (2024). PanIC: Consistent information criteria for general model selection problems. Australian & New Zealand Journal of Statistics,66, 441–466. 1, 4.1, 4.4

  28. [28]

    Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues.Review of Economics and Statistics92, 1004–1016. 1, 4.3

  29. [29]

    Pearson, R., Mundfrom, D., and Piccone, A. (2013). A comparison of ten methods for determining the number of factors in exploratory factor analysis.Multiple Linear Regression Viewpoints39, 1–15. 1

  30. [30]

    J., Zhang, G., Kim, C., and Mels, G

    Preacher, K. J., Zhang, G., Kim, C., and Mels, G. (2013). Choosing the optimal number of factors in exploratory factor analysis: A model selection perspective.Multivariate Behavioral Research48, 28–56. 1

  31. [31]

    Schwarz, G. E. (1978). Estimating the dimension of a model.The Annals of Statistics6, 461–464. 1

  32. [32]

    Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis. Psychometrika52, 333–343. 3.3

  33. [33]

    and White, H

    Sin, C.-Y. and White, H. (1996). Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics71, 207–225. 1, 4.1, 4.4

  34. [34]

    and Belin, T

    Song, J. and Belin, T. R. (2008). Choosing an appropriate number of factors in factor analysis with incomplete data.Computational Statistics and Data Analysis52, 3560–3569. 1 van Handel, R. (2011). On the minimal penalty for Markov order estimation.Probability Theory and Related Fields150, 709–738. 1 van de Geer, S. (2000).Applications of Empirical Proces...

  35. [35]

    Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses.Econometrica 57, 307–333. 1, 4.1

  36. [36]

    (2009).Algebraic Geometry and Statistical Learning Theory

    Watanabe, S. (2009).Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, Cambridge. 1

  37. [37]

    Watanabe, S. (2013). A widely applicable Bayesian information criterion.Journal of Machine Learning Research14, 867–897. 1, 4.7 21

  38. [38]

    Westerhout, J., Nguyen, T., Guo, X., and Nguyen, H. D. (2024). On the asymptotic distribution of the minimum empirical risk. InProceedings of the 41st International Conference on Machine Learning. 1, 4.4

  39. [39]

    White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica50, 1–25. 1

  40. [40]

    (1994).Estimation, Inference and Specification Analysis

    White, H. (1994).Estimation, Inference and Specification Analysis. Cambridge University Press, Cambridge. 1 22