The interplay of signal-to-noise ratio and variance misspecification in Gaussian mixtures

Amnon Balanov; Tamir Bendory; Vladimir Serov

arxiv: 2605.02448 · v1 · submitted 2026-05-04 · 📡 eess.SP · math.ST· stat.TH

The interplay of signal-to-noise ratio and variance misspecification in Gaussian mixtures

Vladimir Serov , Amnon Balanov , Tamir Bendory This is my paper

Pith reviewed 2026-05-08 18:22 UTC · model grok-4.3

classification 📡 eess.SP math.STstat.TH

keywords Gaussian mixture modelsvariance misspecificationsignal-to-noise ratiomaximum likelihood estimationphase transitionsclusteringhard assignmentmean estimation

0 comments

The pith

Variance misspecification in Gaussian mixtures produces an SNR-dependent phase diagram separating recovery, displacement, and collapse.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes maximum likelihood estimation of means in Gaussian mixture models when the likelihood variance differs from the true data variance by a fixed ratio ρ. When the variances match, the estimates always coincide with the true means regardless of signal strength. When the assumed variance is too small, the means shift away from truth and the error grows as the inverse of SNR in the low-SNR regime. When the assumed variance is too large, the fitted means collapse together once the mismatch exceeds a threshold that scales with SNR. The hard-assignment estimator appears as the zero-variance limit of the same family and inherits similar SNR-dependent failures.

Core claim

The authors establish that the ratio ρ = τ/σ between assumed and true variance interacts with SNR to create a sharp phase diagram for mean estimation. Under correct specification (ρ = 1) the maximum-likelihood means recover the truth for any SNR. Under under-smoothing (ρ < 1) the means are displaced from truth and squared error scales as SNR^{-1} at low SNR. Under over-smoothing (ρ > 1) the components merge toward the global center once ρ² exceeds 1 + λ SNR, where λ depends on the geometry of the true means. The hard-assignment objective arises as the τ → 0 limit, with corresponding low- and high-SNR bias results, and Bayes-optimal clustering approaches random guessing in low SNR.

What carries the argument

The mismatched likelihood family parameterized by the variance ratio ρ = τ/σ, which governs the transition between unbiased recovery, SNR-scaled displacement, and geometry-dependent collapse.

If this is right

Matched variance (ρ = 1) guarantees that maximum-likelihood means equal the true means at every SNR.
Under-variance misspecification produces mean displacement whose squared error grows inversely with SNR in the low-SNR regime.
Over-variance misspecification causes distinct means to collapse once ρ² exceeds an SNR-proportional threshold determined by mean geometry.
Hard assignment, recovered as the τ → 0 limit, exhibits the same low-SNR bias and fails to recover true labels when SNR is small.
Bayes-optimal clustering performance approaches random guessing in low SNR, independent of the variance choice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In low-SNR applications, jointly estimating the variance alongside the means may be necessary to avoid the bias regimes identified here.
The phase diagram supplies a diagnostic: systematic growth of mean error with decreasing SNR can indicate under-smoothing, while sudden collapse can indicate over-smoothing.
The results suggest that variance mismatch effects may compound when the number of components is also unknown, though that case lies outside the present analysis.

Load-bearing premise

Observations are generated exactly from a Gaussian mixture with one fixed true variance, and the estimator uses a fixed mismatched variance from the same family.

What would settle it

Simulate data from a known two-component Gaussian mixture at controlled SNR values, fit the mismatched likelihood for several fixed ρ, and verify whether the estimated means stay at the true locations only for ρ = 1, whether error grows exactly as SNR^{-1} for ρ < 1, and whether collapse begins at the predicted ρ threshold for ρ > 1.

Figures

Figures reproduced from arXiv: 2605.02448 by Amnon Balanov, Tamir Bendory, Vladimir Serov.

**Figure 1.** Figure 1: Variance-mismatch phase diagram for component mean estimation. view at source ↗

**Figure 2.** Figure 2: Low-SNR Gaussian mixture mean estimation: maximum-likelihood versus view at source ↗

**Figure 3.** Figure 3: Clustering vs. mean estimation across SNR for a two-component GMM ( view at source ↗

read the original abstract

We study estimation and clustering in Gaussian mixture models under variance misspecification. Observations are generated with true variance $\sigma^2$, while the component means are estimated using a likelihood with variance $\tau^2$, yielding a family of mismatched likelihood functions parameterized by the ratio $\rho=\tau/\sigma$. We show that the interplay between $\rho$ and the signal-to-noise ratio (SNR) induces a sharp phase diagram. Under correct specification ($\rho=1$), maximum likelihood recovers the true means, independently of the SNR. However, once the model is misspecified, two different regimes emerge. Under under-smoothing ($\rho<1$), the estimated Gaussian means are displaced from the truth, and in low SNR this discrepancy grows as the SNR decreases: for every fixed $\rho<1$, the squared error scales as $\mathrm{SNR}^{-1}$. Under over-smoothing ($\rho>1$), the fitted likelihood blurs the cluster separation, causing distinct component means to collapse towards the overall mixture center once $\rho^2$ exceeds a threshold of the form $1 + \lambda\,\mathrm{SNR}$, where $\lambda$ depends on the geometry of the true means. We further show that the hard assignment objective arises as the limit $\tau\to 0$ of the same mismatched likelihood family, and derive corresponding low- and high-SNR results for hard-assignment mean estimation and latent-label recovery. Furthermore, in low SNR, Bayes-optimal clustering is close to random guessing, and the hard-assignment target remains far from the true means. These results show that in low-SNR applications, even mild variance misspecification or hard-assignment procedures can induce substantial bias, whereas in high SNR these effects are largely absent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Variance misspecification in GMMs triggers SNR-dependent regimes with explicit scaling and collapse thresholds.

read the letter

The main thing to know is that this paper maps how the variance ratio rho interacts with SNR to produce distinct estimation behaviors in Gaussian mixtures. Correct specification recovers the means exactly at any SNR. Under-smoothing pulls the estimates away with squared error scaling as SNR inverse in the low-SNR limit. Over-smoothing causes collapse once rho squared exceeds 1 plus lambda times SNR, where lambda depends on the mean geometry. They also recover the hard-assignment limit as tau goes to zero and note that Bayes-optimal clustering is near random in low SNR anyway.

Referee Report

0 major / 2 minor

Summary. The manuscript analyzes population-level maximum likelihood estimation and clustering in K-component Gaussian mixture models under variance misspecification. Observations are drawn from a true GMM with variance σ², but estimation uses a mismatched likelihood with variance τ², parameterized by the ratio ρ = τ/σ. The central results are a phase diagram in the (ρ, SNR) plane: exact recovery of the true means at ρ = 1 for any SNR; for ρ < 1, displacement of the estimated means whose squared error scales as SNR^{-1} in the low-SNR limit; for ρ > 1, collapse of distinct component means toward the global center once ρ² exceeds the threshold 1 + λ·SNR (with λ depending on the geometry of the true means). The hard-assignment objective is recovered as the τ → 0 limit of the same family, with corresponding low- and high-SNR characterizations. The work also contrasts these estimators with Bayes-optimal clustering in the low-SNR regime.

Significance. If the derivations hold, the paper supplies a clean, self-contained population-level characterization of how variance misspecification interacts with SNR to produce sharp transitions between unbiased recovery, systematic bias, and mean collapse. The explicit scaling laws (SNR^{-1} bias for under-smoothing) and the collapse threshold are directly usable for diagnosing when misspecification becomes consequential in low-SNR applications. Deriving hard assignment as the zero-temperature limit of the mismatched likelihood family is a useful unifying observation. The analysis is internally consistent and avoids circularity by working exclusively with the expected mismatched log-likelihood.

minor comments (2)

The dependence of the collapse threshold constant λ on the geometry of the true means is stated but not displayed explicitly; adding the functional form (or a short derivation) in the main text or an appendix would improve readability without altering the central claims.
Notation for the signal-to-noise ratio (SNR) and the precise definition of the low-SNR and high-SNR regimes should be introduced once, early in the manuscript, with a clear reference to the underlying scaling (e.g., separation of means relative to σ).

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript, as well as the recommendation for minor revision. The description of the phase diagram, scaling laws, and connections to hard assignment and Bayes-optimal clustering correctly reflects our contributions.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper conducts a population-level analysis of the mismatched MLE by minimizing the expected log-likelihood of a Gaussian mixture under variance misspecification parameterized by ρ=τ/σ. All stated results—the exact recovery at ρ=1 for any SNR, the SNR^{-1} bias scaling for ρ<1 in low SNR, the collapse threshold for ρ>1, and the hard-assignment limit as τ→0—follow directly from this minimization and standard properties of the Gaussian likelihood. No step reduces by construction to a fitted parameter, self-citation chain, or renamed input; the framework is internally consistent and does not rely on data-dependent fits presented as predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions for Gaussian mixture models and the definition of the mismatched likelihood; no free parameters are fitted to data in the described results.

axioms (2)

domain assumption Observations are generated from a Gaussian mixture model with true variance σ²
Core modeling assumption stated in the abstract for the misspecification setup.
domain assumption The estimation uses a likelihood with variance τ², defining ρ = τ/σ
Defines the family of mismatched likelihoods central to the phase diagram.

pith-pipeline@v0.9.0 · 5621 in / 1594 out tokens · 56836 ms · 2026-05-08T18:22:54.166948+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We consider a broader family of variance-mismatched objectives in which estimation is performed with an algorithmic variance τ² that may differ from the true variance σ². ... A natural scale-free parameter is the mismatch ratio ρ ≜ τ/σ.
IndisputableMonolith.Foundation.BranchSelection branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ρ² ≥ 1 + λ_max(Σ_μ)/σ² ... collapse threshold of the form 1 + λ·SNR.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 2 canonical work pages

[1]

Stegun.Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables

Milton Abramowitz and Irene A. Stegun.Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, 1964

1964
[2]

Minimax theory for high- dimensional gaussian mixtures with sparse mean separation.Advances in Neural Infor- mation Processing Systems, 26, 2013

Martin Azizyan, Aarti Singh, and Larry Wasserman. Minimax theory for high- dimensional gaussian mixtures with sparse mean separation.Advances in Neural Infor- mation Processing Systems, 26, 2013

2013
[3]

Uniform devi- ation bounds for k-means clustering

Olivier Bachem, Mario Lucic, S Hamed Hassani, and Andreas Krause. Uniform devi- ation bounds for k-means clustering. InInternational conference on machine learning, pages 283–291. PMLR, 2017

2017
[4]

Confirmation bias in gaussian mixture models.IEEE Transactions on Information Theory, 71(11):8871–8898, 2025

Amnon Balanov, Tamir Bendory, and Wasim Huleihel. Confirmation bias in gaussian mixture models.IEEE Transactions on Information Theory, 71(11):8871–8898, 2025

2025
[5]

Cambridge University Press, 2012

David Barber.Bayesian reasoning and machine learning. Cambridge University Press, 2012

2012
[6]

Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities.IEEE signal processing magazine, 37(2):58–76, 2020

Tamir Bendory, Alberto Bartesaghi, and Amit Singer. Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities.IEEE signal processing magazine, 37(2):58–76, 2020

2020
[7]

Springer, 2006

Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learn- ing, volume 4. Springer, 2006

2006
[8]

MAD-Bayes: MAP-based asymp- totic derivations from Bayes

Tamara Broderick, Brian Kulis, and Michael Jordan. MAD-Bayes: MAP-based asymp- totic derivations from Bayes. InInternational Conference on Machine Learning, pages 226–234. PMLR, 2013

2013
[9]

A classification EM algorithm for clustering and two stochastic versions.Computational Statistics & Data Analysis, 14(3):315–332, 1992

Gilles Celeux and G´ erard Govaert. A classification EM algorithm for clustering and two stochastic versions.Computational Statistics & Data Analysis, 14(3):315–332, 1992

1992
[10]

Gaussian parsimonious clustering models.Pattern recognition, 28(5):781–793, 1995

Gilles Celeux and G´ erard Govaert. Gaussian parsimonious clustering models.Pattern recognition, 28(5):781–793, 1995. 25

1995
[11]

A complete data processing workflow for cryo-ET and subtomogram averaging

Muyuan Chen, James M Bell, Xiaodong Shi, Stella Y Sun, Zhao Wang, and Steven J Ludtke. A complete data processing workflow for cryo-ET and subtomogram averaging. Nature methods, 16(11):1161–1168, 2019

2019
[12]

John Wiley & Sons, 1999

Thomas M Cover.Elements of information theory. John Wiley & Sons, 1999

1999
[13]

Maximum likelihood from incomplete data via the EM algorithm.Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977

Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm.Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977

1977
[14]

The- oretical guarantees for EM under misspecified Gaussian mixture models.Advances in neural information processing systems, 31, 2018

Raaz Dwivedi, Koulik Khamaru, Martin J Wainwright, Michael I Jordan, et al. The- oretical guarantees for EM under misspecified Gaussian mixture models.Advances in neural information processing systems, 31, 2018

2018
[15]

Cluster analysis of multivariate data: efficiency versus interpretability of classifications.biometrics, 21:768–769, 1965

Edward W Forgy. Cluster analysis of multivariate data: efficiency versus interpretability of classifications.biometrics, 21:768–769, 1965

1965
[16]

Model-based clustering, discriminant analysis, and density estimation.Journal of the American statistical Association, 97(458):611–631, 2002

Chris Fraley and Adrian E Raftery. Model-based clustering, discriminant analysis, and density estimation.Journal of the American statistical Association, 97(458):611–631, 2002

2002
[17]

Brianna C Heggeseth and Nicholas P Jewell. The impact of covariance misspecifica- tion in multivariate gaussian mixtures on estimation and inference: an application to longitudinal modeling.Statistics in medicine, 32(16):2790–2803, 2013

2013
[18]

Data clustering: 50 years beyond k-means.Pattern recognition letters, 31(8):651–666, 2010

Anil K Jain. Data clustering: 50 years beyond k-means.Pattern recognition letters, 31(8):651–666, 2010

2010
[19]

Small-variance asymptotics for exponential family Dirichlet process mixture models.Advances in Neural Information Processing Systems, 25, 2012

Ke Jiang, Brian Kulis, and Michael Jordan. Small-variance asymptotics for exponential family Dirichlet process mixture models.Advances in Neural Information Processing Systems, 25, 2012

2012
[20]

Efficiently learning mixtures of two gaussians

Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Efficiently learning mixtures of two gaussians. InProceedings of the forty-second ACM symposium on Theory of computing, pages 553–562, 2010

2010
[21]

Brian Kulis and Michael I. Jordan. Revisitingk-means: New algorithms via Bayesian nonparametrics. InProceedings of the 29th International Conference on Machine Learn- ing (ICML), 2012

2012
[22]

The catastrophic failure of the k-means algorithm in high dimen- sions, and how hartigan’s algorithm avoids it.arXiv preprint arXiv:2602.09936, 2026

Roy R Lederman, David Silva-S´ anchez, Ziling Chen, Gilles Mordant, Amnon Balanov, and Tamir Bendory. The catastrophic failure of the k-means algorithm in high dimen- sions, and how hartigan’s algorithm avoids it.arXiv preprint arXiv:2602.09936, 2026

work page arXiv 2026
[23]

Leone, Lloyd S

Fred C. Leone, Lloyd S. Nelson, and R. B. Nottingham. The folded normal distribution. Technometrics, 3(4):543–550, 1961

1961
[24]

Nonasymptotic bounds for vector quantization in hilbert spaces.The Annals of Statistics, 43(2):592–619, 2015

Cl´ ement Levrard. Nonasymptotic bounds for vector quantization in hilbert spaces.The Annals of Statistics, 43(2):592–619, 2015. 26

2015
[25]

Least squares quantization in PCM.IEEE Transactions on Information Theory, 28(2):129–137, 1982

Stuart Lloyd. Least squares quantization in PCM.IEEE Transactions on Information Theory, 28(2):129–137, 1982

1982
[26]

Bias from misspecification of the component variances in a normal mixture

Yungtai Lo. Bias from misspecification of the component variances in a normal mixture. Computational statistics & data analysis, 55(9):2739–2747, 2011

2011
[27]

Optimality of spectral clustering in the gaussian mixture model.The Annals of Statistics, 49(5):2506–2530, 2021

Matthias L¨ offler, Anderson Y Zhang, and Harrison H Zhou. Optimality of spectral clustering in the gaussian mixture model.The Annals of Statistics, 49(5):2506–2530, 2021

2021
[28]

arXiv preprint arXiv:1612.02099 (2016)

Yu Lu and Harrison H Zhou. Statistical and computational guarantees of lloyd’s algo- rithm and its variants.arXiv preprint arXiv:1612.02099, 2016

work page arXiv 2016
[29]

K-means as a variational EM approximation of Gaussian mixture models.Pattern Recognition Letters, 125:349–356, 2019

J¨ org L¨ ucke and Dennis Forster. K-means as a variational EM approximation of Gaussian mixture models.Pattern Recognition Letters, 125:349–356, 2019

2019
[30]

Challenges and opportunities in cryo-EM single-particle analysis

Dmitry Lyumkis. Challenges and opportunities in cryo-EM single-particle analysis. Journal of Biological Chemistry, 294(13):5181–5197, 2019

2019
[31]

Cambridge university press, 2003

David JC MacKay.Information theory, inference and learning algorithms. Cambridge university press, 2003

2003
[32]

Finite mixture models

Geoffrey J McLachlan, Sharon X Lee, and Suren I Rathnayake. Finite mixture models. Annual review of statistics and its application, 6(1):355–378, 2019

2019
[33]

Some methods of classification and analysis of multivariate obser- vations

James B McQueen. Some methods of classification and analysis of multivariate obser- vations. InProc. of 5th Berkeley Symposium on Math. Stat. and Prob., pages 281–297, 1967

1967
[34]

Settling the polynomial learnability of mixtures of gaussians

Ankur Moitra and Gregory Valiant. Settling the polynomial learnability of mixtures of gaussians. In2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 93–102. IEEE, 2010

2010
[35]

Sharp optimal recovery in the two component Gaussian mixture model.The Annals of Statistics, 50(4):2096–2126, 2022

Mohamed Ndaoud. Sharp optimal recovery in the two component Gaussian mixture model.The Annals of Statistics, 50(4):2096–2126, 2022

2096
[36]

Strong consistency of k-means clustering.The annals of statistics, pages 135–140, 1981

David Pollard. Strong consistency of k-means clustering.The annals of statistics, pages 135–140, 1981

1981
[37]

A cryo-FIB lift-out technique enables molecular-resolution cryo-ET within native Caenorhabditis elegans tissue.Nature methods, 16(8):757–762, 2019

Miroslava Schaffer, Stefan Pfeffer, Julia Mahamid, Stephan Kleindiek, Tim Laugks, Sahradha Albert, Benjamin D Engel, Andreas Rummel, Andrew J Smith, Wolfgang Baumeister, et al. A cryo-FIB lift-out technique enables molecular-resolution cryo-ET within native Caenorhabditis elegans tissue.Nature methods, 16(8):757–762, 2019

2019
[38]

RELION: implementation of a Bayesian approach to cryo-EM struc- ture determination.Journal of structural biology, 180(3):519–530, 2012

Sjors HW Scheres. RELION: implementation of a Bayesian approach to cryo-EM struc- ture determination.Journal of structural biology, 180(3):519–530, 2012

2012
[39]

Tsybakov.Introduction to Nonparametric Estimation

Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, New York, NY, 2009. 27

2009
[40]

A spectral algorithm for learning mixture models

Santosh Vempala and Grant Wang. A spectral algorithm for learning mixture models. Journal of Computer and System Sciences, 68(4):841–860, 2004

2004
[41]

Unsupervised particle sorting for cryo-EM using probabilistic PCA

Gili Weiss-Dicker, Amitay Eldar, Yoel Shkolinsky, and Tamir Bendory. Unsupervised particle sorting for cryo-EM using probabilistic PCA. In2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2023

2023
[42]

Maximum likelihood estimation of misspecified models.Econometrica: Journal of the econometric society, pages 1–25, 1982

Halbert White. Maximum likelihood estimation of misspecified models.Econometrica: Journal of the econometric society, pages 1–25, 1982. Appendix A Preliminaries A.1 Hessian at the origin configuration Here we analyze the population objective at the origin configurationµ=0. Throughout, we assume that the true means are centered, namely, ¯µ⋆ = 0 (equivalent...

1982
[43]

The conclusion follows from dominated convergence

Moreover, this integrand is bounded in absolute value by logK. The conclusion follows from dominated convergence. Step 3: Convergence of minimizers.Define the rescaled objectiveF τ(µ)≜2τ 2 Lτ(µ;µ ⋆)− dτ 2 log(2πτ 2)−2τ 2 logK, which has the same minimizers asL τ overU. By (2.19),F τ(µ) = Φ(µ) + 2τ2rτ(µ). Hence, by (A.22), sup µ∈U Fτ(µ)−Φ(µ) ≤2τ 2 logK.(A....
[44]

Since in this symmetric two-component model SNR = ∥µ∥2 σ2 , the claim follows immediately from Proposition 3.1

= (µ,−µ), we have ¯µ⋆ = 0 and Σµ = 1 2 µµ⊤ + (−µ)(−µ)⊤ =µµ ⊤.(B.14) Henceλ max(Σµ) =∥µ∥ 2. Since in this symmetric two-component model SNR = ∥µ∥2 σ2 , the claim follows immediately from Proposition 3.1. B.3 Proof of Corollary 3.4 Sinceµ ⋆ ℓ =βv ℓ andP ℓ vℓ = 0, we have ¯µ⋆ = 0, and therefore Σµ = β2 K X ℓ∈[K] vℓv⊤ ℓ .(B.15) For a regular simplex, 1 K P ℓ∈...

[1] [1]

Stegun.Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables

Milton Abramowitz and Irene A. Stegun.Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, 1964

1964

[2] [2]

Minimax theory for high- dimensional gaussian mixtures with sparse mean separation.Advances in Neural Infor- mation Processing Systems, 26, 2013

Martin Azizyan, Aarti Singh, and Larry Wasserman. Minimax theory for high- dimensional gaussian mixtures with sparse mean separation.Advances in Neural Infor- mation Processing Systems, 26, 2013

2013

[3] [3]

Uniform devi- ation bounds for k-means clustering

Olivier Bachem, Mario Lucic, S Hamed Hassani, and Andreas Krause. Uniform devi- ation bounds for k-means clustering. InInternational conference on machine learning, pages 283–291. PMLR, 2017

2017

[4] [4]

Confirmation bias in gaussian mixture models.IEEE Transactions on Information Theory, 71(11):8871–8898, 2025

Amnon Balanov, Tamir Bendory, and Wasim Huleihel. Confirmation bias in gaussian mixture models.IEEE Transactions on Information Theory, 71(11):8871–8898, 2025

2025

[5] [5]

Cambridge University Press, 2012

David Barber.Bayesian reasoning and machine learning. Cambridge University Press, 2012

2012

[6] [6]

Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities.IEEE signal processing magazine, 37(2):58–76, 2020

Tamir Bendory, Alberto Bartesaghi, and Amit Singer. Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities.IEEE signal processing magazine, 37(2):58–76, 2020

2020

[7] [7]

Springer, 2006

Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learn- ing, volume 4. Springer, 2006

2006

[8] [8]

MAD-Bayes: MAP-based asymp- totic derivations from Bayes

Tamara Broderick, Brian Kulis, and Michael Jordan. MAD-Bayes: MAP-based asymp- totic derivations from Bayes. InInternational Conference on Machine Learning, pages 226–234. PMLR, 2013

2013

[9] [9]

A classification EM algorithm for clustering and two stochastic versions.Computational Statistics & Data Analysis, 14(3):315–332, 1992

Gilles Celeux and G´ erard Govaert. A classification EM algorithm for clustering and two stochastic versions.Computational Statistics & Data Analysis, 14(3):315–332, 1992

1992

[10] [10]

Gaussian parsimonious clustering models.Pattern recognition, 28(5):781–793, 1995

Gilles Celeux and G´ erard Govaert. Gaussian parsimonious clustering models.Pattern recognition, 28(5):781–793, 1995. 25

1995

[11] [11]

A complete data processing workflow for cryo-ET and subtomogram averaging

Muyuan Chen, James M Bell, Xiaodong Shi, Stella Y Sun, Zhao Wang, and Steven J Ludtke. A complete data processing workflow for cryo-ET and subtomogram averaging. Nature methods, 16(11):1161–1168, 2019

2019

[12] [12]

John Wiley & Sons, 1999

Thomas M Cover.Elements of information theory. John Wiley & Sons, 1999

1999

[13] [13]

Maximum likelihood from incomplete data via the EM algorithm.Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977

Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm.Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977

1977

[14] [14]

The- oretical guarantees for EM under misspecified Gaussian mixture models.Advances in neural information processing systems, 31, 2018

Raaz Dwivedi, Koulik Khamaru, Martin J Wainwright, Michael I Jordan, et al. The- oretical guarantees for EM under misspecified Gaussian mixture models.Advances in neural information processing systems, 31, 2018

2018

[15] [15]

Cluster analysis of multivariate data: efficiency versus interpretability of classifications.biometrics, 21:768–769, 1965

Edward W Forgy. Cluster analysis of multivariate data: efficiency versus interpretability of classifications.biometrics, 21:768–769, 1965

1965

[16] [16]

Model-based clustering, discriminant analysis, and density estimation.Journal of the American statistical Association, 97(458):611–631, 2002

Chris Fraley and Adrian E Raftery. Model-based clustering, discriminant analysis, and density estimation.Journal of the American statistical Association, 97(458):611–631, 2002

2002

[17] [17]

Brianna C Heggeseth and Nicholas P Jewell. The impact of covariance misspecifica- tion in multivariate gaussian mixtures on estimation and inference: an application to longitudinal modeling.Statistics in medicine, 32(16):2790–2803, 2013

2013

[18] [18]

Data clustering: 50 years beyond k-means.Pattern recognition letters, 31(8):651–666, 2010

Anil K Jain. Data clustering: 50 years beyond k-means.Pattern recognition letters, 31(8):651–666, 2010

2010

[19] [19]

Small-variance asymptotics for exponential family Dirichlet process mixture models.Advances in Neural Information Processing Systems, 25, 2012

Ke Jiang, Brian Kulis, and Michael Jordan. Small-variance asymptotics for exponential family Dirichlet process mixture models.Advances in Neural Information Processing Systems, 25, 2012

2012

[20] [20]

Efficiently learning mixtures of two gaussians

Adam Tauman Kalai, Ankur Moitra, and Gregory Valiant. Efficiently learning mixtures of two gaussians. InProceedings of the forty-second ACM symposium on Theory of computing, pages 553–562, 2010

2010

[21] [21]

Brian Kulis and Michael I. Jordan. Revisitingk-means: New algorithms via Bayesian nonparametrics. InProceedings of the 29th International Conference on Machine Learn- ing (ICML), 2012

2012

[22] [22]

The catastrophic failure of the k-means algorithm in high dimen- sions, and how hartigan’s algorithm avoids it.arXiv preprint arXiv:2602.09936, 2026

Roy R Lederman, David Silva-S´ anchez, Ziling Chen, Gilles Mordant, Amnon Balanov, and Tamir Bendory. The catastrophic failure of the k-means algorithm in high dimen- sions, and how hartigan’s algorithm avoids it.arXiv preprint arXiv:2602.09936, 2026

work page arXiv 2026

[23] [23]

Leone, Lloyd S

Fred C. Leone, Lloyd S. Nelson, and R. B. Nottingham. The folded normal distribution. Technometrics, 3(4):543–550, 1961

1961

[24] [24]

Nonasymptotic bounds for vector quantization in hilbert spaces.The Annals of Statistics, 43(2):592–619, 2015

Cl´ ement Levrard. Nonasymptotic bounds for vector quantization in hilbert spaces.The Annals of Statistics, 43(2):592–619, 2015. 26

2015

[25] [25]

Least squares quantization in PCM.IEEE Transactions on Information Theory, 28(2):129–137, 1982

Stuart Lloyd. Least squares quantization in PCM.IEEE Transactions on Information Theory, 28(2):129–137, 1982

1982

[26] [26]

Bias from misspecification of the component variances in a normal mixture

Yungtai Lo. Bias from misspecification of the component variances in a normal mixture. Computational statistics & data analysis, 55(9):2739–2747, 2011

2011

[27] [27]

Optimality of spectral clustering in the gaussian mixture model.The Annals of Statistics, 49(5):2506–2530, 2021

Matthias L¨ offler, Anderson Y Zhang, and Harrison H Zhou. Optimality of spectral clustering in the gaussian mixture model.The Annals of Statistics, 49(5):2506–2530, 2021

2021

[28] [28]

arXiv preprint arXiv:1612.02099 (2016)

Yu Lu and Harrison H Zhou. Statistical and computational guarantees of lloyd’s algo- rithm and its variants.arXiv preprint arXiv:1612.02099, 2016

work page arXiv 2016

[29] [29]

K-means as a variational EM approximation of Gaussian mixture models.Pattern Recognition Letters, 125:349–356, 2019

J¨ org L¨ ucke and Dennis Forster. K-means as a variational EM approximation of Gaussian mixture models.Pattern Recognition Letters, 125:349–356, 2019

2019

[30] [30]

Challenges and opportunities in cryo-EM single-particle analysis

Dmitry Lyumkis. Challenges and opportunities in cryo-EM single-particle analysis. Journal of Biological Chemistry, 294(13):5181–5197, 2019

2019

[31] [31]

Cambridge university press, 2003

David JC MacKay.Information theory, inference and learning algorithms. Cambridge university press, 2003

2003

[32] [32]

Finite mixture models

Geoffrey J McLachlan, Sharon X Lee, and Suren I Rathnayake. Finite mixture models. Annual review of statistics and its application, 6(1):355–378, 2019

2019

[33] [33]

Some methods of classification and analysis of multivariate obser- vations

James B McQueen. Some methods of classification and analysis of multivariate obser- vations. InProc. of 5th Berkeley Symposium on Math. Stat. and Prob., pages 281–297, 1967

1967

[34] [34]

Settling the polynomial learnability of mixtures of gaussians

Ankur Moitra and Gregory Valiant. Settling the polynomial learnability of mixtures of gaussians. In2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 93–102. IEEE, 2010

2010

[35] [35]

Sharp optimal recovery in the two component Gaussian mixture model.The Annals of Statistics, 50(4):2096–2126, 2022

Mohamed Ndaoud. Sharp optimal recovery in the two component Gaussian mixture model.The Annals of Statistics, 50(4):2096–2126, 2022

2096

[36] [36]

Strong consistency of k-means clustering.The annals of statistics, pages 135–140, 1981

David Pollard. Strong consistency of k-means clustering.The annals of statistics, pages 135–140, 1981

1981

[37] [37]

A cryo-FIB lift-out technique enables molecular-resolution cryo-ET within native Caenorhabditis elegans tissue.Nature methods, 16(8):757–762, 2019

Miroslava Schaffer, Stefan Pfeffer, Julia Mahamid, Stephan Kleindiek, Tim Laugks, Sahradha Albert, Benjamin D Engel, Andreas Rummel, Andrew J Smith, Wolfgang Baumeister, et al. A cryo-FIB lift-out technique enables molecular-resolution cryo-ET within native Caenorhabditis elegans tissue.Nature methods, 16(8):757–762, 2019

2019

[38] [38]

RELION: implementation of a Bayesian approach to cryo-EM struc- ture determination.Journal of structural biology, 180(3):519–530, 2012

Sjors HW Scheres. RELION: implementation of a Bayesian approach to cryo-EM struc- ture determination.Journal of structural biology, 180(3):519–530, 2012

2012

[39] [39]

Tsybakov.Introduction to Nonparametric Estimation

Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, New York, NY, 2009. 27

2009

[40] [40]

A spectral algorithm for learning mixture models

Santosh Vempala and Grant Wang. A spectral algorithm for learning mixture models. Journal of Computer and System Sciences, 68(4):841–860, 2004

2004

[41] [41]

Unsupervised particle sorting for cryo-EM using probabilistic PCA

Gili Weiss-Dicker, Amitay Eldar, Yoel Shkolinsky, and Tamir Bendory. Unsupervised particle sorting for cryo-EM using probabilistic PCA. In2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2023

2023

[42] [42]

Maximum likelihood estimation of misspecified models.Econometrica: Journal of the econometric society, pages 1–25, 1982

Halbert White. Maximum likelihood estimation of misspecified models.Econometrica: Journal of the econometric society, pages 1–25, 1982. Appendix A Preliminaries A.1 Hessian at the origin configuration Here we analyze the population objective at the origin configurationµ=0. Throughout, we assume that the true means are centered, namely, ¯µ⋆ = 0 (equivalent...

1982

[43] [43]

The conclusion follows from dominated convergence

Moreover, this integrand is bounded in absolute value by logK. The conclusion follows from dominated convergence. Step 3: Convergence of minimizers.Define the rescaled objectiveF τ(µ)≜2τ 2 Lτ(µ;µ ⋆)− dτ 2 log(2πτ 2)−2τ 2 logK, which has the same minimizers asL τ overU. By (2.19),F τ(µ) = Φ(µ) + 2τ2rτ(µ). Hence, by (A.22), sup µ∈U Fτ(µ)−Φ(µ) ≤2τ 2 logK.(A....

[44] [44]

Since in this symmetric two-component model SNR = ∥µ∥2 σ2 , the claim follows immediately from Proposition 3.1

= (µ,−µ), we have ¯µ⋆ = 0 and Σµ = 1 2 µµ⊤ + (−µ)(−µ)⊤ =µµ ⊤.(B.14) Henceλ max(Σµ) =∥µ∥ 2. Since in this symmetric two-component model SNR = ∥µ∥2 σ2 , the claim follows immediately from Proposition 3.1. B.3 Proof of Corollary 3.4 Sinceµ ⋆ ℓ =βv ℓ andP ℓ vℓ = 0, we have ¯µ⋆ = 0, and therefore Σµ = β2 K X ℓ∈[K] vℓv⊤ ℓ .(B.15) For a regular simplex, 1 K P ℓ∈...