In low-SNR Gaussian latent-variable models, optimally weighted GMoM using minimal-order moments achieves the same leading asymptotic covariance as MLE via matching layerwise expansions of the information operators.
The interplay of signal-to-noise ratio and variance misspecification in Gaussian mixtures
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We study estimation and clustering in Gaussian mixture models under variance misspecification. Observations are generated with true variance $\sigma^2$, while the component means are estimated using a likelihood with variance $\tau^2$, yielding a family of mismatched likelihood functions parameterized by the ratio $\rho=\tau/\sigma$. We show that the interplay between $\rho$ and the signal-to-noise ratio (SNR) induces a sharp phase diagram. Under correct specification ($\rho=1$), maximum likelihood recovers the true means, independently of the SNR. However, once the model is misspecified, two different regimes emerge. Under under-smoothing ($\rho<1$), the estimated Gaussian means are displaced from the truth, and in low SNR this discrepancy grows as the SNR decreases: for every fixed $\rho<1$, the squared error scales as $\mathrm{SNR}^{-1}$. Under over-smoothing ($\rho>1$), the fitted likelihood blurs the cluster separation, causing distinct component means to collapse towards the overall mixture center once $\rho^2$ exceeds a threshold of the form $1 + \lambda\,\mathrm{SNR}$, where $\lambda$ depends on the geometry of the true means. We further show that the hard assignment objective arises as the limit $\tau\to 0$ of the same mismatched likelihood family, and derive corresponding low- and high-SNR results for hard-assignment mean estimation and latent-label recovery. Furthermore, in low SNR, Bayes-optimal clustering is close to random guessing, and the hard-assignment target remains far from the true means. These results show that in low-SNR applications, even mild variance misspecification or hard-assignment procedures can induce substantial bias, whereas in high SNR these effects are largely absent.
fields
math.ST 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
The generalized method of moments is (almost) statistically efficient in low-SNR Gaussian latent-variable models
In low-SNR Gaussian latent-variable models, optimally weighted GMoM using minimal-order moments achieves the same leading asymptotic covariance as MLE via matching layerwise expansions of the information operators.