Non-standard boundary behaviour in two-component mixture models

Daniel Xiang; Heather Battey; Peter McCullagh

arxiv: 2407.20162 · v3 · submitted 2024-07-29 · 🧮 math.ST · stat.TH

Non-standard boundary behaviour in two-component mixture models

Heather Battey , Peter McCullagh , Daniel Xiang This is my paper

Pith reviewed 2026-05-23 22:45 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords mixture modelsboundary behaviormaximum likelihood estimatordomain of attractionlikelihood ratio statisticheavy-tailed distributionsasymptotic theoryconditional inference

0 comments

The pith

In a Gaussian-heavy-tailed mixture model the MLE for the mixing weight at the zero boundary is positive with limiting probability 1-1/α.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the large-sample behavior of the maximum likelihood estimator for the mixing proportion θ in the model F_θ = (1-θ)F0 + θF1, with F0 standard normal and F1 a fixed heavy-tailed distribution. Standard asymptotic normality holds inside (0,1), but the boundaries exhibit asymmetric non-standard limits. At θ = 0 the probability that the estimator exceeds zero converges to 1 - 1/α, where α is the domain-of-attraction index of the density ratio under the null distribution. This probability and the conditional distribution of the likelihood ratio statistic are governed by the tail properties of F1. The analysis shows that nonparametric extensions within the same tail class yield no inferential improvement.

Core claim

On the left boundary θ=0, the limiting probability that the MLE is positive is 1-1/α, with α indexing the domain of attraction of f1(X)/f0(X) for X drawn from F0. Conditionally on the estimator being positive, the likelihood ratio statistic converges in distribution to a limit G that is not chi-squared with one degree of freedom, as determined by the joint limiting behavior of the sample maximum and sample mean.

What carries the argument

The domain-of-attraction index α of the density ratio f1/f0 under F0, which determines the boundary probability 1-1/α and the form of the conditional null distribution G of the likelihood ratio statistic.

If this is right

For α=1 the rate at which the probability of positivity tends to zero is controlled by the tail heaviness of F1.
Standard chi-squared critical values for the likelihood ratio test are invalid when the estimate is positive.
Extending F1 to the nonparametric class of distributions with equivalent tails provides no additional power or accuracy.
The right boundary at θ=1 recovers the usual 1/2 probability of the estimator falling below the boundary.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This boundary behavior may require modified critical values or conditional inference procedures in mixture model fitting when heavy tails are present.
Similar non-standard limits could appear in other models where one component is an extreme point in the parameter space.
Simulation studies with known α could verify the predicted proportion of positive estimates under the null.

Load-bearing premise

The alternative distribution F1 is completely specified and the density ratio f1/f0 belongs to a domain of attraction with index α between 1 and 2 when sampled under F0.

What would settle it

Generate large samples from F0, compute the MLE hatθ_n many times, and check whether the proportion of positive values approaches 1-1/α for the α implied by the density ratio of the chosen F1.

Figures

Figures reproduced from arXiv: 2407.20162 by Daniel Xiang, Heather Battey, Peter McCullagh.

**Figure 2.** Figure 2: The distribution fθmax is symmetric and trimodal with zero density at ±1 when f1 is standard Cauchy and f0 is standard normal. It is an extreme point relative to N(0, 1) in the sense of (16). The density fθmax is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Simulated probabilities P( ˆθn > 0) based on 2 × 104 replicates, where ˆθn estimates θ in the model (1 − θ)N(0, 1) + θF1, where F1 is standard Cauchy (left) and standard Laplace (right). where cκ is a normalizing constant. This is of the form in Theorem 3.1 with (β0, β1, δ, γ) = (2/(cκπ) 2 , 2, −1/2, κ), leading by Corollary 3.2 to the conclusion that P0( ˆθn > 0) ∼ 2 κκ (log n) 1−κ . In particular, for th… view at source ↗

**Figure 4.** Figure 4: Histogram of R = nZ¯ n/Z(n) (left) and q 2l( ˆθn)/κˆn (right) for two thousand simulations of the Gauss-Cauchy mixture model with n = 107 observations restricted to samples for which Z¯ n > 0. Theorem 3.2 shows that R = S1/Z(n) is less than one with high probability for large n, in which case ˜l has a maximum at ˜θnZ(n) = R/(1 − R). In that case R = ˜θZ(n)/(1 + ˜θZ(n)), and the approximate likelihood-ratio… view at source ↗

read the original abstract

Consider a binary mixture model of the form $F_\theta = (1-\theta)F_0 + \theta F_1$, where $F_0$ is standard Gaussian and $F_1$ is a completely specified heavy-tailed distribution with the same support. For a sample of $n$ independent and identically distributed values $X_i \sim F_\theta$, the maximum likelihood estimator $\hat\theta_n$ is asymptotically normal provided that $0 < \theta < 1$ is an interior point. This paper investigates the large-sample behaviour for boundary points, which is entirely different and strikingly asymmetric for $\theta=0$ and $\theta=1$. The reason for the asymmetry has to do with typical choices such that $F_0$ is an extreme boundary point and $F_1$ is usually not extreme. On the right boundary, well known results on boundary parameter problems are recovered, giving $\lim \mathbb{P}_1(\hat\theta_n < 1)=1/2$. On the left boundary, $\lim\mathbb{P}_0(\hat\theta_n > 0)=1-1/\alpha$, where $1\leq \alpha \leq 2$ indexes the domain of attraction of the density ratio $f_1(X)/f_0(X)$ when $X\sim F_0$. For $\alpha=1$, which is the most important case in practice, we show how the tail behaviour of $F_1$ governs the rate at which $\mathbb{P}_0(\hat\theta_n > 0)$ tends to zero. A new limit theorem for the joint distribution of the sample maximum and sample mean conditional on positivity establishes multiple inferential anomalies. Most notably, given $\hat\theta_n > 0$, the likelihood ratio statistic has a conditional null limit distribution $G\neq\chi^2_1$ determined by the joint limit theorem. We show through this route that no advantage is gained by extending the single distribution $F_1$ to the nonparametric composite mixture generated by the same tail-equivalence class.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives an asymmetric left-boundary probability of 1-1/α for the MLE and a non-chi-squared conditional limit G for the likelihood ratio under domain-of-attraction conditions on the density ratio.

read the letter

The main point is that the left-boundary crossing probability for the mixing weight estimator is 1-1/α rather than the usual 1/2, and the likelihood ratio statistic conditional on crossing has limit G that is not chi-squared. This comes from the joint convergence of the normalized sample mean and maximum when the density ratio belongs to a regularly varying domain of attraction with index α between 1 and 2. The right boundary recovers the standard 1/2 result because the light-tailed component sits at an extreme point while the heavy-tailed one does not. For the common case α=1 the rate at which the probability tends to zero is governed by the tail of F1, which is a concrete practical adjustment. The paper also shows that moving to a nonparametric class with the same tail equivalence buys nothing extra for the asymptotics. These limits follow from the concavity of the log-likelihood and the stable-law behavior of the score at zero, with the conditional distribution G obtained from the joint max-mean limit once regular variation holds. The derivations use standard tools from extreme-value theory and do not appear to contain internal contradictions. One limitation is the assumption that F1 is fully known; in most applications both components would be estimated, which could alter the boundary behavior. The paper is aimed at readers who need corrected null distributions for tests or intervals when a mixture component may be absent. The technical claims are grounded enough to merit referee time even if the proofs need tightening on error bounds.

Referee Report

0 major / 2 minor

Summary. The manuscript examines the large-sample boundary behavior of the MLE hatθ_n in the two-component mixture F_θ = (1-θ)F_0 + θ F_1, with F_0 standard Gaussian and F_1 a fixed heavy-tailed distribution. It asserts that the interior-point normality result fails at the boundaries, yielding the asymmetric limits lim P_1(hatθ_n <1) = 1/2 at the right boundary and lim P_0(hatθ_n >0) = 1-1/α at the left boundary, where α indexes the domain of attraction of the density ratio f_1/f_0 under F_0. A joint limit theorem for the normalized sample maximum and mean, conditional on positivity, is used to obtain a non-standard conditional null limit G for the likelihood-ratio statistic, and the paper concludes that extending F_1 to its tail-equivalence class yields no inferential gain.

Significance. If the stated joint convergence holds, the work supplies a precise description of the asymmetry induced by the choice of F_0 as an extreme point and links the left-boundary probability directly to the regular-variation index α. The explicit use of the concavity of the log-likelihood together with stable-limit theory for the score at zero is a methodological strength, and the demonstration that the nonparametric tail-class extension produces the same G is a useful negative result for practitioners.

minor comments (2)

The abstract states the form of G but does not display the explicit joint characteristic function or the normalizing sequences that define the conditional limit; adding these in the main text would improve readability.
For the α=1 case the manuscript indicates that the tail of F_1 governs the rate at which P_0(hatθ_n >0) tends to zero, yet no explicit rate expression or auxiliary lemma is referenced; a short display of the relevant slowly-varying function would clarify the claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and accurate summary of the manuscript, as well as the recommendation for minor revision. The significance assessment is appreciated, particularly the recognition of the methodological use of concavity and stable-limit theory. Since no specific major comments are listed in the report, we have no points requiring direct rebuttal or revision at this stage.

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external regular-variation theory

full rationale

The central claims follow from the concavity of the log-likelihood l(θ) = ∑ log(1 + θ(r_i − 1)) together with the normalized sum and maximum converging jointly under the paper's stated domain-of-attraction assumption on the density ratio (regular variation with index α ∈ [1,2]). These are standard results from stable-law and extreme-value theory applied to the given tail condition; no parameter is fitted inside the paper and then relabeled as a prediction, no self-definition equates the output to the input, and no load-bearing step reduces to a self-citation. The conditional limit G is obtained directly from the joint convergence once regular variation holds, without internal fitting or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the existence of a domain of attraction for the density ratio f1/f0 under F0 and on standard extreme-value theory for the joint convergence of the maximum and the mean; no free parameters are fitted inside the paper and no new entities are postulated.

axioms (2)

domain assumption The density ratio f1(X)/f0(X) belongs to a domain of attraction with index α ∈ [1,2] when X ~ F0.
Invoked to obtain the explicit limit 1-1/α at the left boundary.
standard math Standard results on boundary-parameter problems apply at θ=1.
Used to recover the known 1/2 probability at the right boundary.

pith-pipeline@v0.9.0 · 5911 in / 1544 out tokens · 20221 ms · 2026-05-23T22:45:52.565181+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

lim P0(ˆθn>0)=1−1/α where 1≤α≤2 indexes the domain of attraction of the density ratio f1(X)/f0(X) when X∼F0
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

new limit theorem for the joint distribution of the sample maximum and sample mean conditional on positivity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

H, Goldie, C

Bingham, N. H, Goldie, C. M. and Teugels, J. L. (1987). Regular Variation. Cambridge University Press, Cambridge, UK

work page 1987
[2]

Brazzale, A. R. and Mameli, V. (2024). Likelihood asymptotics in nonregular settings: a review with emphasis on the likelihood ratio. Statist. Sci. , 39, 322– 345. 30

work page 2024
[3]

and Chernoff, H

Bickel, P. and Chernoff, H. (1993). Asymptotic distribution of the likelihood ratio statistic in a prototypical non regular problem. In: Ghosh, et al. (Eds.), Statistics and Probability: A Raghu Raj Bahadur Festschrift , Wiley Eastern Limited, New Delhi, pp. 83–96

work page 1993
[4]

de Bruijn, N. G. (1959). Pairs of slowly oscillating functions occurring in asymp- totic problems concerning the Laplace transform. Nieuw Arch. Wisk., 7, 20–26

work page 1959
[5]

and Li, P

Chen, J. and Li, P. (2009). Hypothesis test for normal mixture models: the EM approach. Ann. Statist., 37, 2523–2542

work page 2009
[6]

Chernoff, H. (1954). On the distribution of the likelihood ratio. Ann. Math. Statist. , 25, 573–578

work page 1954
[7]

Chow, T. L. and Teugels, J. L. (1979) The sum and the maximum of i.i.d. random variables. In Proceedings of the Second Prague Symposium on Asymptotic Statis- tics, Petr Mandl and Marie Huˇ skov´ a, Editors, 81–92. North Holland Publishing Company

work page 1979
[8]

D., and Tusher, V

Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. , 96, 1151–1160

work page 2001
[9]

Efron, B. (2012). Large-scale inference: empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge University Press

work page 2012
[10]

Geyer, C. J. (1994). On the asymptotics of constrained M-estimation. Ann. Statist., 22, 1993–2010

work page 1994
[11]

Ghosh, J. K. and Sen, P. K. (1985). On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. In Proceeding of the Berkeley Conference in honour of Jerzy Neyman and Jack Kiefer , 789–806

work page 1985
[12]

V and Kolmogorov, A

Gnedenko, B. V and Kolmogorov, A. N. (1954). Limit Distributions for Sums of Independent Random Variables . Translated from the Russian and annotated by K. L. Chung; with an appendix by J. L. Doob. Addison-Wesley Pub. Co., Cambridge, Massachusetts

work page 1954
[13]

and Marriott, P

Li, P., Chen, J. and Marriott, P. (2009). Non-finite Fisher information and ho- mogeneity: an EM approach Biometrika, 96, 411–426

work page 2009
[14]

and Shao, Y

Liu, X. and Shao, Y. (2004). Asymptotics for the likelihood ratio test in a two- component normal mixture model. J. Statist. Plann. Inference , 123, 61–81

work page 2004
[15]

and Polson, N

McCullagh, P. and Polson, N. (2018). Statistical sparsity. Biometrika, 105, 797– 814

work page 2018
[16]

Patra, R. K. and Sen, B. (2016). Estimation of a two-component mixture model with applications to multiple testing. J. R. Statist. Soc. B , 78, 869–893. 31

work page 2016
[17]

Self, S. G. and Liang, K-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc. , 82, 605–610

work page 1987
[18]

and Drton, M

Shi, H. and Drton, M. (2024). On universal inference in Gaussian mixture mod- els. arXiv:2407.19361v1

work page arXiv 2024
[19]

and Balakrishnan, S

Wassereman, L., Ramdas, A. and Balakrishnan, S. (2020). Universal inference. Proc. Nat. Acad. Sci. USA , 117, 16880–16890

work page 2020
[20]

Vu, H. T. V. and Zhou, S. (1997). Generalization of likelihood ratio tests under nonstandard conditions. Ann. Statist., 22, 1993–2010

work page 1997
[21]

Zolotarev, V. M. (1986). One-dimensional Stable Distributions. American Math- ematical Society, Providence. 32

work page 1986

[1] [1]

H, Goldie, C

Bingham, N. H, Goldie, C. M. and Teugels, J. L. (1987). Regular Variation. Cambridge University Press, Cambridge, UK

work page 1987

[2] [2]

Brazzale, A. R. and Mameli, V. (2024). Likelihood asymptotics in nonregular settings: a review with emphasis on the likelihood ratio. Statist. Sci. , 39, 322– 345. 30

work page 2024

[3] [3]

and Chernoff, H

Bickel, P. and Chernoff, H. (1993). Asymptotic distribution of the likelihood ratio statistic in a prototypical non regular problem. In: Ghosh, et al. (Eds.), Statistics and Probability: A Raghu Raj Bahadur Festschrift , Wiley Eastern Limited, New Delhi, pp. 83–96

work page 1993

[4] [4]

de Bruijn, N. G. (1959). Pairs of slowly oscillating functions occurring in asymp- totic problems concerning the Laplace transform. Nieuw Arch. Wisk., 7, 20–26

work page 1959

[5] [5]

and Li, P

Chen, J. and Li, P. (2009). Hypothesis test for normal mixture models: the EM approach. Ann. Statist., 37, 2523–2542

work page 2009

[6] [6]

Chernoff, H. (1954). On the distribution of the likelihood ratio. Ann. Math. Statist. , 25, 573–578

work page 1954

[7] [7]

Chow, T. L. and Teugels, J. L. (1979) The sum and the maximum of i.i.d. random variables. In Proceedings of the Second Prague Symposium on Asymptotic Statis- tics, Petr Mandl and Marie Huˇ skov´ a, Editors, 81–92. North Holland Publishing Company

work page 1979

[8] [8]

D., and Tusher, V

Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. , 96, 1151–1160

work page 2001

[9] [9]

Efron, B. (2012). Large-scale inference: empirical Bayes methods for estimation, testing, and prediction (Vol. 1). Cambridge University Press

work page 2012

[10] [10]

Geyer, C. J. (1994). On the asymptotics of constrained M-estimation. Ann. Statist., 22, 1993–2010

work page 1994

[11] [11]

Ghosh, J. K. and Sen, P. K. (1985). On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. In Proceeding of the Berkeley Conference in honour of Jerzy Neyman and Jack Kiefer , 789–806

work page 1985

[12] [12]

V and Kolmogorov, A

Gnedenko, B. V and Kolmogorov, A. N. (1954). Limit Distributions for Sums of Independent Random Variables . Translated from the Russian and annotated by K. L. Chung; with an appendix by J. L. Doob. Addison-Wesley Pub. Co., Cambridge, Massachusetts

work page 1954

[13] [13]

and Marriott, P

Li, P., Chen, J. and Marriott, P. (2009). Non-finite Fisher information and ho- mogeneity: an EM approach Biometrika, 96, 411–426

work page 2009

[14] [14]

and Shao, Y

Liu, X. and Shao, Y. (2004). Asymptotics for the likelihood ratio test in a two- component normal mixture model. J. Statist. Plann. Inference , 123, 61–81

work page 2004

[15] [15]

and Polson, N

McCullagh, P. and Polson, N. (2018). Statistical sparsity. Biometrika, 105, 797– 814

work page 2018

[16] [16]

Patra, R. K. and Sen, B. (2016). Estimation of a two-component mixture model with applications to multiple testing. J. R. Statist. Soc. B , 78, 869–893. 31

work page 2016

[17] [17]

Self, S. G. and Liang, K-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc. , 82, 605–610

work page 1987

[18] [18]

and Drton, M

Shi, H. and Drton, M. (2024). On universal inference in Gaussian mixture mod- els. arXiv:2407.19361v1

work page arXiv 2024

[19] [19]

and Balakrishnan, S

Wassereman, L., Ramdas, A. and Balakrishnan, S. (2020). Universal inference. Proc. Nat. Acad. Sci. USA , 117, 16880–16890

work page 2020

[20] [20]

Vu, H. T. V. and Zhou, S. (1997). Generalization of likelihood ratio tests under nonstandard conditions. Ann. Statist., 22, 1993–2010

work page 1997

[21] [21]

Zolotarev, V. M. (1986). One-dimensional Stable Distributions. American Math- ematical Society, Providence. 32

work page 1986