Adaptivity of the NPMLE to finitely discrete mixing distributions in Gaussian/Poisson mixtures

Stanislav Volgushev; Yan Zhang

arxiv: 2604.12087 · v1 · submitted 2026-04-13 · 🧮 math.ST · stat.TH

Adaptivity of the NPMLE to finitely discrete mixing distributions in Gaussian/Poisson mixtures

Yan Zhang , Stanislav Volgushev This is my paper

Pith reviewed 2026-05-10 14:47 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords nonparametric maximum likelihood estimatormixture modelsGaussian mixturesPoisson mixturesparametric ratesadaptivitydemixinglikelihood ratio test

0 comments

The pith

The NPMLE achieves exact parametric rates for density and posterior mean estimation in Gaussian and Poisson mixtures exactly when the mixing distribution is finitely discrete.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the nonparametric maximum likelihood estimator for mixture models in which each observation arises from a Gaussian or Poisson distribution with an unknown mixing distribution on the parameter. It establishes that this estimator recovers the marginal density of the observations and the posterior mean of the latent parameter at the standard parametric rate of n to the power of minus one half whenever the mixing distribution has finite support inside a fixed bounded set. The same estimator also reaches the optimal rate previously known for recovering the locations and weights in overparameterized finite mixtures. In addition, the likelihood ratio statistic for testing the number of mixture components stays asymptotically tight if and only if the true mixing distribution is finitely discrete.

Core claim

When the true mixing distribution is finitely discrete with support contained in a fixed bounded set, the NPMLE for Gaussian or Poisson mixtures converges at the parametric rate n^{-1/2} for both the marginal density and the posterior mean. It simultaneously attains the optimal demixing rate for recovering the atoms and weights of the mixing distribution. The likelihood ratio test statistic for the number of components is asymptotically tight precisely in the finitely discrete case and diverges otherwise.

What carries the argument

The nonparametric maximum likelihood estimator of the unknown mixing distribution, obtained by maximizing the likelihood over all probability measures on the bounded parameter space.

Load-bearing premise

The support of the true mixing distribution lies inside a fixed bounded set.

What would settle it

Simulate data from a finitely discrete two-point mixing distribution inside the bounded set and check whether the NPMLE's marginal density estimation error decays exactly at rate n to the power of minus one half; slower decay would refute the parametric-rate claim.

read the original abstract

We study the nonparametric maximum likelihood estimator (NPMLE) for Gaussian and Poisson mixture models, assuming the support of the true mixing distribution lies in a fixed bounded set. In this setting, we establish exact parametric rates for both, marginal density estimation and the posterior mean when the true mixing distribution is finitely discrete. Moreover, we show that the NPMLE attains the optimal demixing rate previously known for overparameterized finite mixture models. Finally, we identify a new adaptivity phenomenon for inference: the likelihood ratio test statistic is asymptotically tight if and only if the true mixing distribution is finitely discrete.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets exact parametric rates for NPMLE density and posterior mean estimation plus an LRT adaptivity result when the mixing distribution is finitely discrete and supported on a known fixed compact set.

read the letter

The main thing to know is that under the fixed bounded support assumption, the NPMLE achieves the parametric n^{-1/2} rate for marginal density estimation and posterior mean when the mixing measure is finitely discrete. It also matches the best known demixing rate for overparameterized finite mixtures, and the likelihood ratio test statistic is asymptotically tight if and only if the mixing distribution is finitely discrete. That last part is the clearest new observation. Earlier work had slower rates in the general nonparametric case, so the exact rates and the iff statement for the LRT stand out as the advance. The bounded support is used to discretize the parameter space uniformly and get tight concentration on the log-likelihood, which is how the proofs close. The technical steps look standard but are carried out carefully for both Gaussian and Poisson kernels. The citation pattern covers the relevant finite-mixture and NPMLE literature without obvious gaps. The main limitation is exactly what the stress-test note flags: everything depends on the support lying in a known fixed compact interval. If that bound is allowed to grow or is unknown, the discretization and tail control no longer work, and neither the parametric rates nor the LRT tightness are guaranteed. The paper states the assumption clearly but does not supply a matching lower bound or counter-example to show it is necessary, so the result remains conditional on that hypothesis. This is for people working on mixture models and nonparametric MLE asymptotics. A reader who wants precise rates under finite discreteness will get something concrete from it. The formal grounding and the new adaptivity claim are strong enough that it deserves a serious referee even if the support condition needs more discussion in revision.

Referee Report

1 major / 0 minor

Summary. The manuscript studies the nonparametric maximum likelihood estimator (NPMLE) for Gaussian and Poisson mixture models under the assumption that the support of the true mixing distribution lies in a fixed bounded set. It claims to establish exact parametric (n^{-1/2}) rates for marginal density estimation and posterior mean estimation when the mixing distribution is finitely discrete, shows that the NPMLE attains the optimal demixing rate known for overparameterized finite mixtures, and identifies a new adaptivity result in which the likelihood ratio test statistic is asymptotically tight if and only if the mixing distribution is finitely discrete.

Significance. If the derivations hold, the results would be significant for the theory of mixture models by demonstrating that the NPMLE adapts to finite discreteness to achieve parametric rates (rather than slower nonparametric rates) for both density and posterior mean estimation, while also attaining known optimal demixing rates. The LRT tightness criterion offers a potential new diagnostic for discreteness. However, the fixed bounded support assumption is central to all claims and restricts applicability; without lower bounds or counterexamples establishing its necessity, the sharpness of the adaptivity and 'iff' statements remains unclear. No machine-checked proofs or reproducible code are mentioned.

major comments (1)

[Abstract] Abstract and standing assumption: the exact parametric rates for marginal density and posterior mean, the attainment of the optimal demixing rate, and the LRT asymptotic tightness 'iff' claim are all proved under the fixed bounded support hypothesis. This assumption enables the discretization of the parameter space and uniform concentration arguments, but the manuscript supplies neither a matching lower bound nor a counter-example showing that the rates or the 'iff' statement fail when the support bound grows with n or is unknown. This is load-bearing for the central adaptivity claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for recognizing the potential significance of the adaptivity results. We address the major comment on the bounded support assumption below.

read point-by-point responses

Referee: [Abstract] Abstract and standing assumption: the exact parametric rates for marginal density and posterior mean, the attainment of the optimal demixing rate, and the LRT asymptotic tightness 'iff' claim are all proved under the fixed bounded support hypothesis. This assumption enables the discretization of the parameter space and uniform concentration arguments, but the manuscript supplies neither a matching lower bound nor a counter-example showing that the rates or the 'iff' statement fail when the support bound grows with n or is unknown. This is load-bearing for the central adaptivity claims.

Authors: We agree that the fixed bounded support assumption is essential to the analysis: it permits discretization of the mixing parameter space and enables the uniform concentration arguments that deliver the parametric rates. This is a standard modeling choice in the NPMLE literature for mixtures precisely to obtain such sharp results. Our claims are therefore stated and proved under this hypothesis, and we do not assert that the same rates or the 'iff' characterization continue to hold when the bound grows with n or is unknown. Establishing matching lower bounds or counter-examples in those regimes would require different technical tools and is beyond the scope of the present work. We will revise the abstract and introduction to state the assumption more prominently at the outset and will add a short discussion paragraph acknowledging its role and the open question of necessity. revision: partial

Circularity Check

0 steps flagged

No circularity; results derived from standard MLE asymptotics under explicit bounded-support assumption

full rationale

The paper's central claims (parametric rates for density and posterior mean estimation, attainment of optimal demixing rates, and LRT tightness iff finite discreteness) are proved via concentration of the log-likelihood process and discretization arguments that rely on the standing assumption of support in a fixed compact set. This assumption is stated upfront and used to control tails and mesh size; it is not derived from the results themselves. No equations reduce a claimed prediction to a fitted quantity by construction, no uniqueness theorems are imported from self-citations in a load-bearing way, and no ansatz is smuggled via prior work. The derivation chain is self-contained against external benchmarks such as classical MLE theory for finite mixtures.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The results rely on standard regularity conditions for mixture models and the explicit bounded-support assumption; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption The support of the true mixing distribution lies in a fixed bounded set.
This assumption is stated in the abstract as necessary for establishing the parametric rates.
domain assumption Standard regularity conditions hold for the Gaussian and Poisson mixture likelihoods.
Implicit in any asymptotic analysis of NPMLE convergence rates.

pith-pipeline@v0.9.0 · 5391 in / 1421 out tokens · 52670 ms · 2026-05-10T14:47:00.771810+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fast computation and theoretical guarantees for the NPMLE in exponential family mixtures
math.ST 2026-04 unverdicted novelty 6.0

A data-compression technique reduces NPMLE computation cost to logarithmic in n for exponential family mixtures, while approximate NPMLEs attain near-parametric rates for marginal density estimation.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · cited by 1 Pith paper

[1]

Azaïs, J.-M., Gassiat, É., and Mercadier, C. (2009). The likelihood ratio test for general mixture models with or without structural parameter. ESAIM: Probability and Statistics, 13:301–327. Banach, S. (1938). Über homogene polynome in (l 2). Studia Mathematica, 7(1):36–44. Bandeira, A., Niles-Weed, J., and Rigollet, P. (2020). Optimal rates of estimation...

work page 2009
[2]

Han, Y ., Niles-Weed, J., Shen, Y ., and Wu, Y . (2025). Besting good–turing: Optimality of non-parametric maximum likelihood for distribution estimation. arXiv preprint arXiv:2509.07355. Hartigan, J. A. (1985). A failure of likelihood asymptotics for normal mixtures. In Proceedings of the Barkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, 1985...

work page arXiv 2025
[3]

and Sen, B

Ignatiadis, N. and Sen, B. (2025). Empirical partially bayes multiple testing and compoundχ2 decisions. The Annals of Statistics, 53(1):1–36. Jana, S., Polyanskiy, Y ., and Wu, Y . (2025). Optimal empirical bayes estimation for the poisson model via minimum-distance methods. Information and Inference: A Journal of the IMA, 14(4):iaaf027. Jia, Z., Polyansk...

work page 2025
[4]

and Zhang, C.-H

Jiang, W. and Zhang, C.-H. (2019). Rate of divergence of the nonparametric likelihood ratio test for gaussian mixtures. Bernoulli, 25(4B):3400–3420. 14 Koenker, R. and Gu, J. (2026). Empirical Bayes: Some Tools, Rules, and Duals. Econometric Society Monographs. Cambridge University Press. Krantz, S. G. (2001).Function theory of several complex variables, volume

work page 2019
[5]

Lambert, D

American Mathematical Soc. Lambert, D. and Tierney, L. (1984). Asymptotic properties of maximum likelihood estimates in the mixed poisson model. The Annals of Statistics, pages 1388–1399. Lindsay, B. G. (1989). Moment matrices: applications in mixtures. The Annals of Statistics, 17(2):722–

work page 1984
[6]

Lindsay, B. G. (1995). Mixture models: theory, geometry, and applications, volume

work page 1995
[7]

IMS. Liu, X. and Shao, Y . (2003). Asymptotics for likelihood ratio tests under loss of identifiability. The Annals of Statistics, 31(3):807–832. Ma, Y ., Wu, Y ., and Yang, P. (2025). On the best approximation by finite gaussian mixtures. IEEE Transactions on Information Theory. Miao, Z., Kong, W., Vinayak, R. K., Sun, W., and Han, F. (2024). Fisher-pitm...

work page arXiv 2003
[8]

and Wu, Y

Shen, Y . and Wu, Y . (2022). Empirical bayes estimation: When doesg-modeling beatf-modeling in theory (and in practice)? arXiv preprint arXiv:2211.12692. Soloff, J. A., Guntuboyina, A., and Sen, B. (2025). Multivariate, heteroscedastic empirical bayes via nonparametric maximum likelihood. Journal of the Royal Statistical Society Series B: Statistical Met...

work page arXiv 2022
[9]

Zhang, C.-H. (2009). Generalized maximum likelihood estimation of normal mixture densities.Statistica Sinica, pages 1297–1318. 15 6Supplement 6.1 Parametric behavior for distribution with a finite number of support points. Here we discuss the setting briefly mentioned in Remark 4.2, where the component family{p θ :θ∈Θ} is supported on a finite sample spac...

work page 2009
[10]

star-shaped

Takeg 0 to be a point mass. IfΘis unbounded, then for anyc >0, sup g∈Gn(c) n χ2 fg, fg0 → ∞in probability. Moreover,L n(G, g0)→ ∞in probability. In the Gaussian case, the divergence of the likelihood ratio statistic was first observed by Hartigan (1985). The proposition above directly extends their finding to the Poisson setting and, moreover, to the dive...

work page 1985
[11]

Forθ∈R d, the tensorθ ⊗k ∈(R d)⊗k is the k-way array with entries (θ⊗k)i1,...,ik = kY j=1 θij

Its characterization will rely on moment tensors{m k,g}k∈N, which generalize univariate moments. Forθ∈R d, the tensorθ ⊗k ∈(R d)⊗k is the k-way array with entries (θ⊗k)i1,...,ik = kY j=1 θij . With this notation, forg∈ Gwe define mk,g := Z Θ (θ−θ 0)⊗k dg(θ). Next, define the orthogonal polynomial family associated withp θ0 by qα(x) := ∂α ∂θ α pθ(x) pθ0(x)...

work page 1938
[12]

We assume (GP)

yields the sharper characterization ∥T∥ 2 = sup c∈Sd−1 ⟨T, c⊗k⟩ .(8) Throughout this section, we adopt the notation from Section 6.4 and additionally define M := sup θ∈Θ ∥θ−θ 0∥, where∥ · ∥denotes the Euclidean norm onR d. We assume (GP). Our main goal is to prove Theorem 2.1 and Theorem 3.1. The positive parts of both results hinge on controlling the siz...

work page 1982
[13]

dα! s pθ0 fg0 . Since q pθ0/fg0 is uniformly bounded (by Lemma 6.2), Example 2.10.10 of Van der Vaart and Wellner (1996) implies that it suffices to verify the Donsker property and to identify a square-integrable envelope for the class    ∞X k=1 X |α|=k cα,ghα :g∈ G\g 0    . By Theorem 2.13.2 of Van der Vaart and Wellner (1996), this class isfg0 dµ-...

work page 1996
[14]

To summarize, we have established the desired bound Errˆg(x)≤(M+ √ d) p C0C2 S(x) ∆ˆg

4Jk k! # <∞. To summarize, we have established the desired bound Errˆg(x)≤(M+ √ d) p C0C2 S(x) ∆ˆg. Moreover, in the pure Poisson case (b= 0),S(x)is finite for every admissible outcomexbecause S∈L 2(fg0dµ)andµis the counting measure onN. In the pure Gaussian case (b=d), eachq α is a product of Hermite polynomials and satisfies the bound7 |qα(x)|= dY l=1 |...

work page 2023
[15]

7See inequality (18.14.9) in Olver et al. (2024). 26 Proof of Lemma 6.5.The proof follows the proof strategy of Lemma 3.1 in Doss et al. (2023). We adopt a probabilistic argument. Drawcfrom the uniform distribution onS d−1. For anyx∈R d, by inequality (2.2) in the proof of Lemma 3.1 in Doss et al. (2023) P(|cT x|< t∥x∥)< t √ d. LetΘ 1 :={θ 1 −θ 2 :θ 1, θ2...

work page 2024
[16]

Then there exists a sequence of polynomials{q k}k∈N onΘsuch thatq 0(θ)≡1and, for anyk, k ′ ∈N, Z Θ qk(θ)q k′(θ) dg0(θ) =1 {k=k′}

Proposition 6.4.Assume thatg 0 is supported on a compact setΘand is not finitely discrete. Then there exists a sequence of polynomials{q k}k∈N onΘsuch thatq 0(θ)≡1and, for anyk, k ′ ∈N, Z Θ qk(θ)q k′(θ) dg0(θ) =1 {k=k′}. Proof.Sinceg 0 is not finitely discrete, there must exist an indexl∈[d]such that thelth marginal ofg 0 is not finite discrete. For thisl...

work page 1996
[17]

Fixing someθ 0 >0, we takeg 0 =δ θ0, so thatf g0 =p θ0. As before, we restrict attention to a two-point mixture subfamily, and the corresponding limiting distribution for the quantities of interest retains the form (17), but with a different covariance structure: Cov(Gk1,G k2) = exp (θk1 −θ 0)(θk2 −θ 0)/θ0 −1q exp (θk1 −θ 0)2/θ0 −1 exp (θk2 −θ 0)2/θ0 −1 ....

work page 1996
[18]

Moreover, we must have inf s∈S Z (s−)2 f0 dµ >0

For the third term and the denominator, by Example 2.10.7 and Lemma 2.10.14 of Van der Vaart and Wellner (1996), the class{(s −)2 :s∈ S}isf 0dµ-Glivenko–Cantelli in Probability. Moreover, we must have inf s∈S Z (s−)2 f0 dµ >0. Otherwise, there would exist a sequence{s n}n∈N ⊆ Swith R [(sn)−]2f0 dµ→0. Since R (sn)+f0 dµ−R (sn)−f0 dµ= R snf0 dµ= 0, it follo...

work page 1996
[19]

This completes the proof. 6.10 Proof of Theorem 6.2 Lemma 6.7.Under (SS) and (A1), for anys∈ S, sup f∈F ℓn(f)−ℓ n(f0)≥ 1 2 (Gn(s))+ 2 +o P(1).(21) Proof.By (SS), for anys=s f ∈ Sthere is an associated submodel{f t}t∈[0,τ] ⊆ Fgiven by ft := 1− t χ(f, f0) f0 + t χ(f, f0) f, whereτ :=χ(f, f 0)>0by (A1). Since sup f∈F ℓn(f)−ℓ n(f0)≥sup t∈[0,τ] ℓn(ft)−ℓ n(f0),...

work page 1996
[20]

LetSbe anf 0 dµ-square-integrable envelope forS

nX i=1 sf(Xi)− 1 2 χ2(f, f0) nX i=1 s2 f(Xi) +χ 2(f, f0) nX i=1 s2 f(Xi)R χ(f, f0)sf(Xi) , whereRis a deterministic function satisfyingR(x)→0asx→0. LetSbe anf 0 dµ-square-integrable envelope forS. By the union bound and the dominated convergence theorem, for any fixedε >0, P 1√n sup f∈F \f0 max i∈[n] |sf(Xi)| ≥ε ! ≤nP S2(X1)≥nε 2 ≤ 1 ε2 Z {x:S 2(x)>nε2} S...

work page 1996

[1] [1]

Azaïs, J.-M., Gassiat, É., and Mercadier, C. (2009). The likelihood ratio test for general mixture models with or without structural parameter. ESAIM: Probability and Statistics, 13:301–327. Banach, S. (1938). Über homogene polynome in (l 2). Studia Mathematica, 7(1):36–44. Bandeira, A., Niles-Weed, J., and Rigollet, P. (2020). Optimal rates of estimation...

work page 2009

[2] [2]

Han, Y ., Niles-Weed, J., Shen, Y ., and Wu, Y . (2025). Besting good–turing: Optimality of non-parametric maximum likelihood for distribution estimation. arXiv preprint arXiv:2509.07355. Hartigan, J. A. (1985). A failure of likelihood asymptotics for normal mixtures. In Proceedings of the Barkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, 1985...

work page arXiv 2025

[3] [3]

and Sen, B

Ignatiadis, N. and Sen, B. (2025). Empirical partially bayes multiple testing and compoundχ2 decisions. The Annals of Statistics, 53(1):1–36. Jana, S., Polyanskiy, Y ., and Wu, Y . (2025). Optimal empirical bayes estimation for the poisson model via minimum-distance methods. Information and Inference: A Journal of the IMA, 14(4):iaaf027. Jia, Z., Polyansk...

work page 2025

[4] [4]

and Zhang, C.-H

Jiang, W. and Zhang, C.-H. (2019). Rate of divergence of the nonparametric likelihood ratio test for gaussian mixtures. Bernoulli, 25(4B):3400–3420. 14 Koenker, R. and Gu, J. (2026). Empirical Bayes: Some Tools, Rules, and Duals. Econometric Society Monographs. Cambridge University Press. Krantz, S. G. (2001).Function theory of several complex variables, volume

work page 2019

[5] [5]

Lambert, D

American Mathematical Soc. Lambert, D. and Tierney, L. (1984). Asymptotic properties of maximum likelihood estimates in the mixed poisson model. The Annals of Statistics, pages 1388–1399. Lindsay, B. G. (1989). Moment matrices: applications in mixtures. The Annals of Statistics, 17(2):722–

work page 1984

[6] [6]

Lindsay, B. G. (1995). Mixture models: theory, geometry, and applications, volume

work page 1995

[7] [7]

IMS. Liu, X. and Shao, Y . (2003). Asymptotics for likelihood ratio tests under loss of identifiability. The Annals of Statistics, 31(3):807–832. Ma, Y ., Wu, Y ., and Yang, P. (2025). On the best approximation by finite gaussian mixtures. IEEE Transactions on Information Theory. Miao, Z., Kong, W., Vinayak, R. K., Sun, W., and Han, F. (2024). Fisher-pitm...

work page arXiv 2003

[8] [8]

and Wu, Y

Shen, Y . and Wu, Y . (2022). Empirical bayes estimation: When doesg-modeling beatf-modeling in theory (and in practice)? arXiv preprint arXiv:2211.12692. Soloff, J. A., Guntuboyina, A., and Sen, B. (2025). Multivariate, heteroscedastic empirical bayes via nonparametric maximum likelihood. Journal of the Royal Statistical Society Series B: Statistical Met...

work page arXiv 2022

[9] [9]

Zhang, C.-H. (2009). Generalized maximum likelihood estimation of normal mixture densities.Statistica Sinica, pages 1297–1318. 15 6Supplement 6.1 Parametric behavior for distribution with a finite number of support points. Here we discuss the setting briefly mentioned in Remark 4.2, where the component family{p θ :θ∈Θ} is supported on a finite sample spac...

work page 2009

[10] [10]

star-shaped

Takeg 0 to be a point mass. IfΘis unbounded, then for anyc >0, sup g∈Gn(c) n χ2 fg, fg0 → ∞in probability. Moreover,L n(G, g0)→ ∞in probability. In the Gaussian case, the divergence of the likelihood ratio statistic was first observed by Hartigan (1985). The proposition above directly extends their finding to the Poisson setting and, moreover, to the dive...

work page 1985

[11] [11]

Forθ∈R d, the tensorθ ⊗k ∈(R d)⊗k is the k-way array with entries (θ⊗k)i1,...,ik = kY j=1 θij

Its characterization will rely on moment tensors{m k,g}k∈N, which generalize univariate moments. Forθ∈R d, the tensorθ ⊗k ∈(R d)⊗k is the k-way array with entries (θ⊗k)i1,...,ik = kY j=1 θij . With this notation, forg∈ Gwe define mk,g := Z Θ (θ−θ 0)⊗k dg(θ). Next, define the orthogonal polynomial family associated withp θ0 by qα(x) := ∂α ∂θ α pθ(x) pθ0(x)...

work page 1938

[12] [12]

We assume (GP)

yields the sharper characterization ∥T∥ 2 = sup c∈Sd−1 ⟨T, c⊗k⟩ .(8) Throughout this section, we adopt the notation from Section 6.4 and additionally define M := sup θ∈Θ ∥θ−θ 0∥, where∥ · ∥denotes the Euclidean norm onR d. We assume (GP). Our main goal is to prove Theorem 2.1 and Theorem 3.1. The positive parts of both results hinge on controlling the siz...

work page 1982

[13] [13]

dα! s pθ0 fg0 . Since q pθ0/fg0 is uniformly bounded (by Lemma 6.2), Example 2.10.10 of Van der Vaart and Wellner (1996) implies that it suffices to verify the Donsker property and to identify a square-integrable envelope for the class    ∞X k=1 X |α|=k cα,ghα :g∈ G\g 0    . By Theorem 2.13.2 of Van der Vaart and Wellner (1996), this class isfg0 dµ-...

work page 1996

[14] [14]

To summarize, we have established the desired bound Errˆg(x)≤(M+ √ d) p C0C2 S(x) ∆ˆg

4Jk k! # <∞. To summarize, we have established the desired bound Errˆg(x)≤(M+ √ d) p C0C2 S(x) ∆ˆg. Moreover, in the pure Poisson case (b= 0),S(x)is finite for every admissible outcomexbecause S∈L 2(fg0dµ)andµis the counting measure onN. In the pure Gaussian case (b=d), eachq α is a product of Hermite polynomials and satisfies the bound7 |qα(x)|= dY l=1 |...

work page 2023

[15] [15]

7See inequality (18.14.9) in Olver et al. (2024). 26 Proof of Lemma 6.5.The proof follows the proof strategy of Lemma 3.1 in Doss et al. (2023). We adopt a probabilistic argument. Drawcfrom the uniform distribution onS d−1. For anyx∈R d, by inequality (2.2) in the proof of Lemma 3.1 in Doss et al. (2023) P(|cT x|< t∥x∥)< t √ d. LetΘ 1 :={θ 1 −θ 2 :θ 1, θ2...

work page 2024

[16] [16]

Then there exists a sequence of polynomials{q k}k∈N onΘsuch thatq 0(θ)≡1and, for anyk, k ′ ∈N, Z Θ qk(θ)q k′(θ) dg0(θ) =1 {k=k′}

Proposition 6.4.Assume thatg 0 is supported on a compact setΘand is not finitely discrete. Then there exists a sequence of polynomials{q k}k∈N onΘsuch thatq 0(θ)≡1and, for anyk, k ′ ∈N, Z Θ qk(θ)q k′(θ) dg0(θ) =1 {k=k′}. Proof.Sinceg 0 is not finitely discrete, there must exist an indexl∈[d]such that thelth marginal ofg 0 is not finite discrete. For thisl...

work page 1996

[17] [17]

Fixing someθ 0 >0, we takeg 0 =δ θ0, so thatf g0 =p θ0. As before, we restrict attention to a two-point mixture subfamily, and the corresponding limiting distribution for the quantities of interest retains the form (17), but with a different covariance structure: Cov(Gk1,G k2) = exp (θk1 −θ 0)(θk2 −θ 0)/θ0 −1q exp (θk1 −θ 0)2/θ0 −1 exp (θk2 −θ 0)2/θ0 −1 ....

work page 1996

[18] [18]

Moreover, we must have inf s∈S Z (s−)2 f0 dµ >0

For the third term and the denominator, by Example 2.10.7 and Lemma 2.10.14 of Van der Vaart and Wellner (1996), the class{(s −)2 :s∈ S}isf 0dµ-Glivenko–Cantelli in Probability. Moreover, we must have inf s∈S Z (s−)2 f0 dµ >0. Otherwise, there would exist a sequence{s n}n∈N ⊆ Swith R [(sn)−]2f0 dµ→0. Since R (sn)+f0 dµ−R (sn)−f0 dµ= R snf0 dµ= 0, it follo...

work page 1996

[19] [19]

This completes the proof. 6.10 Proof of Theorem 6.2 Lemma 6.7.Under (SS) and (A1), for anys∈ S, sup f∈F ℓn(f)−ℓ n(f0)≥ 1 2 (Gn(s))+ 2 +o P(1).(21) Proof.By (SS), for anys=s f ∈ Sthere is an associated submodel{f t}t∈[0,τ] ⊆ Fgiven by ft := 1− t χ(f, f0) f0 + t χ(f, f0) f, whereτ :=χ(f, f 0)>0by (A1). Since sup f∈F ℓn(f)−ℓ n(f0)≥sup t∈[0,τ] ℓn(ft)−ℓ n(f0),...

work page 1996

[20] [20]

LetSbe anf 0 dµ-square-integrable envelope forS

nX i=1 sf(Xi)− 1 2 χ2(f, f0) nX i=1 s2 f(Xi) +χ 2(f, f0) nX i=1 s2 f(Xi)R χ(f, f0)sf(Xi) , whereRis a deterministic function satisfyingR(x)→0asx→0. LetSbe anf 0 dµ-square-integrable envelope forS. By the union bound and the dominated convergence theorem, for any fixedε >0, P 1√n sup f∈F \f0 max i∈[n] |sf(Xi)| ≥ε ! ≤nP S2(X1)≥nε 2 ≤ 1 ε2 Z {x:S 2(x)>nε2} S...

work page 1996