Density estimation for Hellinger via minimum-distance estimators: mixtures of Gaussians, log-concave, and more

Jerry Li; Spencer Compton

arxiv: 2606.11469 · v1 · pith:24KGWL6Mnew · submitted 2026-06-09 · 💻 cs.DS · cs.LG· math.ST· stat.TH

Density estimation for Hellinger via minimum-distance estimators: mixtures of Gaussians, log-concave, and more

Spencer Compton , Jerry Li This is my paper

Pith reviewed 2026-06-27 10:56 UTC · model grok-4.3

classification 💻 cs.DS cs.LGmath.STstat.TH

keywords density estimationHellinger distanceminimum-distance estimatormixtures of Gaussianslog-concave densitiesVC dimensionnear-linear timereverse data processing

0 comments

The pith

Minimum-distance estimators extend to Hellinger distance via reverse data processing inequalities, giving near-linear time algorithms for Gaussian mixture and log-concave density estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the classic minimum-distance estimator method, long used for total variation, can be lifted to Hellinger distance once one bounds the VC dimension of a related concept class. The key step is importing recent reverse data processing inequalities to translate that bound into Hellinger guarantees. Because the same VC bound already powered fast total-variation algorithms, the same algorithmic skeleton carries over unchanged. Modifying the procedure of Acharya et al. therefore produces the first near-linear-time learners for univariate mixtures of log-concave densities and mixtures of Gaussians with arbitrary variances, all with near-optimal sample size.

Core claim

By replacing the total-variation Yatracos class with a related concept class whose VC dimension is bounded, and invoking reverse data processing inequalities, the minimum-distance estimator yields an estimator whose Hellinger error is controlled by the same sample size and runtime that previously controlled total-variation error.

What carries the argument

The modified minimum-distance estimator that uses VC-dimension bounds on a related concept class together with reverse data processing inequalities to obtain Hellinger guarantees while preserving the original near-linear runtime.

If this is right

Near-linear time density estimation in Hellinger distance for univariate Gaussian mixtures with arbitrary variances.
Near-linear time density estimation in Hellinger distance for univariate mixtures of log-concave densities.
Near-optimal sample complexity for both classes.
Any future total-variation algorithm whose analysis rests only on a VC-dimension bound can be reused for Hellinger with only the addition of the reverse-inequality step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same recipe may apply to other distances that admit reverse data processing inequalities, such as certain f-divergences.
The approach could be tested on multivariate log-concave mixtures once suitable VC bounds are derived.
If the related concept class admits efficient empirical risk minimization, the overall procedure remains practical beyond the univariate setting.

Load-bearing premise

That a bound on the VC dimension of the related concept class, once fed through reverse data processing inequalities, is enough to control Hellinger error and to keep the algorithm near-linear.

What would settle it

A concrete mixture of two univariate Gaussians with arbitrary variances for which the VC dimension of the associated concept class grows faster than linear in the number of components, or for which the modified Acharya et al. procedure requires super-linear time.

read the original abstract

We study the task of density estimation, where we hope to accurately estimate a probability density from $n$ samples. A textbook method for density estimation in total variation distance is the minimum-distance estimator approach, where we conclude both the algorithm and the analysis merely from bounding the VC dimension of a particular concept class (the so-called Yatracos class). While this technique has originally yielded sharp guarantees primarily for total variation distance, in this work we extend the minimum-distance estimator approach for learning within Hellinger distance. Our main observation is that we may produce an analogous recipe for Hellinger (where we only require bounding the VC dimension of a related concept class) by drawing connections to recent results yielding reverse data processing inequalities. This recipe is flexible enough to accommodate fast algorithms originally designed for total variation distance; by modifying the approach of Acharya et al. (2017) we conclude the first near-linear time algorithm for learning classes including univariate mixtures of log-concave densities and mixtures of Gaussians (with arbitrary variances), with near-optimal sample complexity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts min-distance estimators to Hellinger via reverse DPI and claims this yields the first near-linear time algorithms for univariate Gaussian and log-concave mixtures, but the runtime transfer is the part that needs checking.

read the letter

The main takeaway is that they show how to port the minimum-distance estimator recipe to Hellinger distance by bounding the VC dimension of a related concept class and invoking recent reverse data processing inequalities. They then modify the Acharya et al. 2017 procedure to produce the first near-linear time algorithms for univariate mixtures of Gaussians (arbitrary variances) and mixtures of log-concave densities, with near-optimal sample complexity.

What works is the observation that the same VC-based analysis can be lifted once you have the right inequality connecting Hellinger to the concept class. That is a clean way to reuse existing machinery without starting from scratch, and the target classes are practically relevant.

The soft spot is runtime preservation. The original Acharya algorithm gets its speed from structural properties of the total-variation Yatracos class that let them do efficient optimization or enumeration. The paper asserts that the analogous Hellinger class admits the same modification, but the abstract gives no explicit construction or argument showing that the implementation steps remain near-linear once the distance changes. If that step does not go through cleanly, the claimed runtime would not hold even if the sample complexity does.

The sample bounds look plausible if the VC dimension is indeed controlled, and there is no obvious circularity. This is the kind of paper that would interest people working on efficient density estimation for mixture models or on Hellinger-based learning. A reader who already knows the Acharya framework and the reverse DPI results would get the most out of it.

It is worth sending to peer review so the runtime claim can be checked against the actual construction.

Referee Report

2 major / 2 minor

Summary. The paper extends the minimum-distance estimator technique from total variation (TV) distance—where guarantees follow from VC-dimension bounds on the Yatracos class—to Hellinger distance. The key step invokes recent reverse data-processing inequalities to obtain an analogous recipe that again requires only a VC-dimension bound on a related concept class. By adapting the algorithmic framework of Acharya et al. (2017), the authors claim the first near-linear-time algorithms for Hellinger density estimation of univariate mixtures of log-concave densities and of Gaussians with arbitrary variances, attaining near-optimal sample complexity.

Significance. If the central claims hold, the work supplies a modular template for converting existing TV-efficient estimators into Hellinger estimators while preserving both statistical optimality and computational efficiency. This is valuable for the listed nonparametric classes, where Hellinger distance is often the more natural loss and where near-linear runtime had previously been unavailable.

major comments (2)

[Abstract / §3] Abstract and §3 (algorithmic modification): the claim that the modified Acharya et al. (2017) procedure retains near-linear runtime rests on the assertion that the Hellinger analogue of the Yatracos class admits the same efficient optimization or enumeration steps as the TV version. Because the original runtime exploits specific structural properties of the TV class, an explicit argument (or pseudocode) showing that the Hellinger version inherits an equivalent implementation is required; bounded VC dimension alone does not automatically guarantee this.
[Theorem 1] Theorem 1 (or main sample-complexity statement): the paper must state the precise sample complexity obtained for the listed classes and verify that it matches the known Hellinger lower bounds up to logarithmic factors. Without an explicit comparison or derivation of the constant factors arising from the reverse DPI, the “near-optimal” claim cannot be assessed.

minor comments (2)

[§2] Notation for the Hellinger Yatracos class should be introduced with an explicit definition parallel to the TV version, including the precise functional form used in the minimum-distance program.
[§2.2] The dependence on the external reverse-DPI results should be stated as a black-box lemma with a clear citation and a one-sentence reminder of the exact inequality applied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We appreciate the recognition of the work's potential as a modular template for Hellinger estimators. We address each major comment below and will revise the manuscript accordingly to address the points raised.

read point-by-point responses

Referee: [Abstract / §3] Abstract and §3 (algorithmic modification): the claim that the modified Acharya et al. (2017) procedure retains near-linear runtime rests on the assertion that the Hellinger analogue of the Yatracos class admits the same efficient optimization or enumeration steps as the TV version. Because the original runtime exploits specific structural properties of the TV class, an explicit argument (or pseudocode) showing that the Hellinger version inherits an equivalent implementation is required; bounded VC dimension alone does not automatically guarantee this.

Authors: We agree that the manuscript would benefit from an explicit argument. In the revision we will add a dedicated paragraph and pseudocode in §3 that directly compares the optimization oracle for the Hellinger concept class (induced by the reverse DPI) to the TV Yatracos class, showing that the same enumeration and linear-programming steps from Acharya et al. (2017) apply verbatim because the class retains the same combinatorial structure. revision: yes
Referee: [Theorem 1] Theorem 1 (or main sample-complexity statement): the paper must state the precise sample complexity obtained for the listed classes and verify that it matches the known Hellinger lower bounds up to logarithmic factors. Without an explicit comparison or derivation of the constant factors arising from the reverse DPI, the “near-optimal” claim cannot be assessed.

Authors: We will expand Theorem 1 to state the exact sample complexities (including all logarithmic factors) for univariate Gaussian mixtures and log-concave mixtures. We will also add a short derivation of the multiplicative constants contributed by the reverse DPI and a direct comparison paragraph showing that the resulting bounds match the known Hellinger lower bounds up to logarithmic factors. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external VC bounds and reverse DPI results

full rationale

The paper extends the minimum-distance estimator technique from total variation (originally from Acharya et al. 2017) to Hellinger distance. The key step is observing that bounding the VC dimension of a related concept class, combined with reverse data processing inequalities from recent external results, yields the Hellinger guarantees while preserving the algorithmic structure. No claim reduces by construction to a quantity defined inside the paper, no fitted parameter is relabeled as a prediction, and no load-bearing step collapses to a self-citation chain. The runtime preservation for mixtures of Gaussians and log-concave densities follows from modifying the external procedure once the VC bound is established. This is a standard non-circular extension relying on independent prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; the central claim rests on the existence of suitable VC-dimension bounds for a related concept class and on the applicability of recent reverse data processing inequalities. No free parameters or invented entities are mentioned.

axioms (1)

domain assumption Reverse data processing inequalities connect Hellinger distance to the minimum-distance estimator framework for the relevant concept classes
Invoked to produce the Hellinger analogue of the total-variation recipe.

pith-pipeline@v0.9.1-grok · 5724 in / 1271 out tokens · 34242 ms · 2026-06-27T10:56:26.293094+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 3 canonical work pages

[1]

Attainability of two-point testing rates for finite-sample location estimation.arXiv preprint arXiv:2502.05730,

Spencer Compton and Gregory Valiant. Attainability of two-point testing rates for finite-sample location estimation.arXiv preprint arXiv:2502.05730,

arXiv
[2]

Ratio covers of convex sets and optimal mixture density estimation.arXiv preprint arXiv:2602.16142,

Spencer Compton, Gábor Lugosi, Jaouad Mourtada, Jian Qian, and Nikita Zhivotovskiy. Ratio covers of convex sets and optimal mixture density estimation.arXiv preprint arXiv:2602.16142,

arXiv
[3]

Differentially private learning of structured discrete distributions.Advances in Neural Information Processing Systems, 28, 2015a

Ilias Diakonikolas, Moritz Hardt, and Ludwig Schmidt. Differentially private learning of structured discrete distributions.Advances in Neural Information Processing Systems, 28, 2015a. Ilias Diakonikolas, Daniel M Kane, and Vladimir Nikishkin. Optimal algorithms and lower bounds for testing closeness of structured distributions. In2015 IEEE 56th Annual Sy...

Pith/arXiv arXiv
[4]

Near-optimal closeness testing of discrete histogram distributions.arXiv preprint arXiv:1703.01913, 2017a

Ilias Diakonikolas, Daniel M Kane, and Vladimir Nikishkin. Near-optimal closeness testing of discrete histogram distributions.arXiv preprint arXiv:1703.01913, 2017a. Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Learning multivariate log-concave distri- butions. InConference on Learning Theory, pages 711–727. PMLR, 2017b. Ilias Diakonikolas, Da...

Pith/arXiv arXiv
[5]

URL https://doi.org/10.1214/16-AOS1480

doi: 10.1214/16-AOS1480. URL https://doi.org/10.1214/16-AOS1480. Gil Kur, Yuval Dagan, and Alexander Rakhlin. Optimality of maximum likelihood for log-concave density estimation and bounded convex regression.arXiv preprint arXiv:1903.05315,

work page doi:10.1214/16-aos1480 1903
[6]

Nonparametric mle for gaussian location mixtures: certified computation and generic behavior.arXiv preprint arXiv:2503.20193,

Yury Polyanskiy and Mark Sellke. Nonparametric mle for gaussian location mixtures: certified computation and generic behavior.arXiv preprint arXiv:2503.20193,

arXiv
[7]

Self-regularizing property of nonparametric maximum likelihood estimator in mixture models.arXiv preprint arXiv:2008.08244,

Yury Polyanskiy and Yihong Wu. Self-regularizing property of nonparametric maximum likelihood estimator in mixture models.arXiv preprint arXiv:2008.08244,

arXiv 2008
[8]

URLhttps://doi.org/10.1214/ 18-STS666

doi: 10.1214/18-STS666. URLhttps://doi.org/10.1214/ 18-STS666. Richard J Samworth. Nonparametric inference under shape constraints: past, present and future. arXiv preprint arXiv:2509.26040,

work page doi:10.1214/18-sts666
[9]

Optimal estimation of Gaussian mixtures via denoised method of moments.The Annals of Statistics, 48(4):1981 – 2007,

Yihong Wu and Pengkun Yang. Optimal estimation of Gaussian mixtures via denoised method of moments.The Annals of Statistics, 48(4):1981 – 2007,

1981
[10]

URL https://doi.org/10.1214/19-AOS1873

doi: 10.1214/19-AOS1873. URL https://doi.org/10.1214/19-AOS1873. Yuhong Yang and Andrew Barron. Information-theoretic determination of minimax rates of con- vergence.Annals of Statistics, pages 1564–1599,

work page doi:10.1214/19-aos1873
[11]

Discussion In this work, we developed fast, sample-efficient algorithms for density estimation in Hellinger

Appendix A. Discussion In this work, we developed fast, sample-efficient algorithms for density estimation in Hellinger. Our primary applications focused on mixtures of log-concave or Gaussian densities; classical total variation guarantees for minimum-distance estimation seem ripe for adoption with our recipe, and we believe this can achieve nearly-optim...

2017
[12]

Nothing in this presen- tation is notably different from the technique of Pensia et al

ProofFor a self-contained presentation, we will present a proof in this text. Nothing in this presen- tation is notably different from the technique of Pensia et al. (2023), and we will restrict to discrete distributions only for ease of notation. Also, for ease of notation, letN≜ 1 H2(p,q) . The proof technique will be to decompose the Hellinger distance...

2023
[13]

H2(p, q)≥ 1 2·log(4/H 2(p, q)) H2(p, q) All that remains is to show that for any suchI ∗, there exists anf τ with approximately the Hellinger distance of the interval’s contribution. Case 1:I ∗ = [1 + 1 2j+1 ,1 + 1 2j ).First, we upper bound the value of the sum fromI ∗: X x: p(x) q(x) ∈[1+ 1 2j+1 ,1+ 1 2j ) p p(x)− p q(x) 2 ≤ X x: p(x) q(x) ∈[1+ 1 2j+1 ,...

2005
[14]

+ ln 8 δ . a−b√a ≤r=⇒∆≤r √a=⇒∆ 2 −r 2a≤0 =⇒∆ 2 −r 2(∆ +b)≤0 =⇒∆ 2 −r 2∆−r 2b≤0 =⇒∆≤ r2 + √ r4 + 4r2b 2 =⇒∆≤r 2 +r √ b(root of the quadratic equation) We will continue using this shorthand notation ofa, b,∆, r. The proof will follow quickly via the same logic as Claim 2.22 of Compton and Valiant (2025): √a− √ b ≤min 2|a−b|√ b , p |a−b| (fora, b≥0; first pa...

2025
[15]

IfH 2(fθ, ˆfn)≥1/n, then Theorem 16 impliesH 2(fθ, ˆfn)≲d D H(fθ, ˆfn)

Our modified estimator will choose D=⌈log(4n)⌉and estimate: ˆfn ≜arg min fθ∈F dD H(fθ, fn)(21) The desired error guarantee now follows by the same proof strategy as Theorem 7: H2(f, ˆfn)≲H 2(f, fθ) + H2(fθ, ˆfn) 23 COMPTONLI Consider two cases. IfH 2(fθ, ˆfn)≥1/n, then Theorem 16 impliesH 2(fθ, ˆfn)≲d D H(fθ, ˆfn). Otherwise, we can use the upper bound of...

2009
[16]

PIECEWISE-LINEAR APPROXIMATION We now present the required piecewise-linear approximation result for log-concave densities

G.1.1. PIECEWISE-LINEAR APPROXIMATION We now present the required piecewise-linear approximation result for log-concave densities. This result follows from the proof of Theorem 12 of Diakonikolas et al. (2016): 28 DENSITY ESTIMATION FORHELLINGER VIA MINIMUM-DISTANCE ESTIMATORS Lemma 20 (Piecewise-linear approximation for log-concave; follows from Diakonik...

2016
[17]

Proof Without loss of generality, supposep(x)≥q(x)in the support

Then, there exists some intervalI ⊆[0,1]where sZ I p(x)− sZ I q(x) !2 ≳H 2(p, q) Note how here we are proving a reverse data processing inequality that crucially only loses a constant fraction of squared Hellinger distance, instead of alog(2/H 2(p, q))term. Proof Without loss of generality, supposep(x)≥q(x)in the support. (Ifp, qintersect, we can break th...

2017
[18]

Any degree-dpolynomial can be defined by its coefficients inR d+1

We will do so by approximately determining whether there is a polynomial withA 1 distance at mostτ, and employing a binary search over the correct value ofτ. Any degree-dpolynomial can be defined by its coefficients inR d+1. LetC τ ⊂R d+1 denote the set of such polynomials. These polynomials must: (i) be non-negative overJ, and (ii) must satisfy an upper ...

1989
[19]

special points

Note that the analogous result in Acharya et al. (2017) forA 1 distance only has additive error, and this is why ourH1 oracle also incurs multiplicative error. Additionally, since the runtime of this subroutine isO(s)it is not a dominating term in the final runtime. Theorem 26Fix an intervalJ= [a, b]⊆[−1,1]. Suppose there aressamples in this rangea≤ x1 ≤ ...

2017
[20]

We will chooseFas the class of3k-piecewise degree≲log(n)polynomial densities. By Corollary 12: H2(f, ˆfn)≲inf fθ∈F t d log(n)·H 2(f, fθ) log(2k/H2(f, fθ)) + klog 2(n)·(log(n) + log(2/δ)) n We use existing results that imply any Gaussian density is approximated withinεsquared Hellinger distance by a 3-piecewise degree≲log(1/ε)polynomial (Lemma 36 of Chan e...

2014

[1] [1]

Attainability of two-point testing rates for finite-sample location estimation.arXiv preprint arXiv:2502.05730,

Spencer Compton and Gregory Valiant. Attainability of two-point testing rates for finite-sample location estimation.arXiv preprint arXiv:2502.05730,

arXiv

[2] [2]

Ratio covers of convex sets and optimal mixture density estimation.arXiv preprint arXiv:2602.16142,

Spencer Compton, Gábor Lugosi, Jaouad Mourtada, Jian Qian, and Nikita Zhivotovskiy. Ratio covers of convex sets and optimal mixture density estimation.arXiv preprint arXiv:2602.16142,

arXiv

[3] [3]

Differentially private learning of structured discrete distributions.Advances in Neural Information Processing Systems, 28, 2015a

Ilias Diakonikolas, Moritz Hardt, and Ludwig Schmidt. Differentially private learning of structured discrete distributions.Advances in Neural Information Processing Systems, 28, 2015a. Ilias Diakonikolas, Daniel M Kane, and Vladimir Nikishkin. Optimal algorithms and lower bounds for testing closeness of structured distributions. In2015 IEEE 56th Annual Sy...

Pith/arXiv arXiv

[4] [4]

Near-optimal closeness testing of discrete histogram distributions.arXiv preprint arXiv:1703.01913, 2017a

Ilias Diakonikolas, Daniel M Kane, and Vladimir Nikishkin. Near-optimal closeness testing of discrete histogram distributions.arXiv preprint arXiv:1703.01913, 2017a. Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Learning multivariate log-concave distri- butions. InConference on Learning Theory, pages 711–727. PMLR, 2017b. Ilias Diakonikolas, Da...

Pith/arXiv arXiv

[5] [5]

URL https://doi.org/10.1214/16-AOS1480

doi: 10.1214/16-AOS1480. URL https://doi.org/10.1214/16-AOS1480. Gil Kur, Yuval Dagan, and Alexander Rakhlin. Optimality of maximum likelihood for log-concave density estimation and bounded convex regression.arXiv preprint arXiv:1903.05315,

work page doi:10.1214/16-aos1480 1903

[6] [6]

Nonparametric mle for gaussian location mixtures: certified computation and generic behavior.arXiv preprint arXiv:2503.20193,

Yury Polyanskiy and Mark Sellke. Nonparametric mle for gaussian location mixtures: certified computation and generic behavior.arXiv preprint arXiv:2503.20193,

arXiv

[7] [7]

Self-regularizing property of nonparametric maximum likelihood estimator in mixture models.arXiv preprint arXiv:2008.08244,

Yury Polyanskiy and Yihong Wu. Self-regularizing property of nonparametric maximum likelihood estimator in mixture models.arXiv preprint arXiv:2008.08244,

arXiv 2008

[8] [8]

URLhttps://doi.org/10.1214/ 18-STS666

doi: 10.1214/18-STS666. URLhttps://doi.org/10.1214/ 18-STS666. Richard J Samworth. Nonparametric inference under shape constraints: past, present and future. arXiv preprint arXiv:2509.26040,

work page doi:10.1214/18-sts666

[9] [9]

Optimal estimation of Gaussian mixtures via denoised method of moments.The Annals of Statistics, 48(4):1981 – 2007,

Yihong Wu and Pengkun Yang. Optimal estimation of Gaussian mixtures via denoised method of moments.The Annals of Statistics, 48(4):1981 – 2007,

1981

[10] [10]

URL https://doi.org/10.1214/19-AOS1873

doi: 10.1214/19-AOS1873. URL https://doi.org/10.1214/19-AOS1873. Yuhong Yang and Andrew Barron. Information-theoretic determination of minimax rates of con- vergence.Annals of Statistics, pages 1564–1599,

work page doi:10.1214/19-aos1873

[11] [11]

Discussion In this work, we developed fast, sample-efficient algorithms for density estimation in Hellinger

Appendix A. Discussion In this work, we developed fast, sample-efficient algorithms for density estimation in Hellinger. Our primary applications focused on mixtures of log-concave or Gaussian densities; classical total variation guarantees for minimum-distance estimation seem ripe for adoption with our recipe, and we believe this can achieve nearly-optim...

2017

[12] [12]

Nothing in this presen- tation is notably different from the technique of Pensia et al

ProofFor a self-contained presentation, we will present a proof in this text. Nothing in this presen- tation is notably different from the technique of Pensia et al. (2023), and we will restrict to discrete distributions only for ease of notation. Also, for ease of notation, letN≜ 1 H2(p,q) . The proof technique will be to decompose the Hellinger distance...

2023

[13] [13]

H2(p, q)≥ 1 2·log(4/H 2(p, q)) H2(p, q) All that remains is to show that for any suchI ∗, there exists anf τ with approximately the Hellinger distance of the interval’s contribution. Case 1:I ∗ = [1 + 1 2j+1 ,1 + 1 2j ).First, we upper bound the value of the sum fromI ∗: X x: p(x) q(x) ∈[1+ 1 2j+1 ,1+ 1 2j ) p p(x)− p q(x) 2 ≤ X x: p(x) q(x) ∈[1+ 1 2j+1 ,...

2005

[14] [14]

+ ln 8 δ . a−b√a ≤r=⇒∆≤r √a=⇒∆ 2 −r 2a≤0 =⇒∆ 2 −r 2(∆ +b)≤0 =⇒∆ 2 −r 2∆−r 2b≤0 =⇒∆≤ r2 + √ r4 + 4r2b 2 =⇒∆≤r 2 +r √ b(root of the quadratic equation) We will continue using this shorthand notation ofa, b,∆, r. The proof will follow quickly via the same logic as Claim 2.22 of Compton and Valiant (2025): √a− √ b ≤min 2|a−b|√ b , p |a−b| (fora, b≥0; first pa...

2025

[15] [15]

IfH 2(fθ, ˆfn)≥1/n, then Theorem 16 impliesH 2(fθ, ˆfn)≲d D H(fθ, ˆfn)

Our modified estimator will choose D=⌈log(4n)⌉and estimate: ˆfn ≜arg min fθ∈F dD H(fθ, fn)(21) The desired error guarantee now follows by the same proof strategy as Theorem 7: H2(f, ˆfn)≲H 2(f, fθ) + H2(fθ, ˆfn) 23 COMPTONLI Consider two cases. IfH 2(fθ, ˆfn)≥1/n, then Theorem 16 impliesH 2(fθ, ˆfn)≲d D H(fθ, ˆfn). Otherwise, we can use the upper bound of...

2009

[16] [16]

PIECEWISE-LINEAR APPROXIMATION We now present the required piecewise-linear approximation result for log-concave densities

G.1.1. PIECEWISE-LINEAR APPROXIMATION We now present the required piecewise-linear approximation result for log-concave densities. This result follows from the proof of Theorem 12 of Diakonikolas et al. (2016): 28 DENSITY ESTIMATION FORHELLINGER VIA MINIMUM-DISTANCE ESTIMATORS Lemma 20 (Piecewise-linear approximation for log-concave; follows from Diakonik...

2016

[17] [17]

Proof Without loss of generality, supposep(x)≥q(x)in the support

Then, there exists some intervalI ⊆[0,1]where sZ I p(x)− sZ I q(x) !2 ≳H 2(p, q) Note how here we are proving a reverse data processing inequality that crucially only loses a constant fraction of squared Hellinger distance, instead of alog(2/H 2(p, q))term. Proof Without loss of generality, supposep(x)≥q(x)in the support. (Ifp, qintersect, we can break th...

2017

[18] [18]

Any degree-dpolynomial can be defined by its coefficients inR d+1

We will do so by approximately determining whether there is a polynomial withA 1 distance at mostτ, and employing a binary search over the correct value ofτ. Any degree-dpolynomial can be defined by its coefficients inR d+1. LetC τ ⊂R d+1 denote the set of such polynomials. These polynomials must: (i) be non-negative overJ, and (ii) must satisfy an upper ...

1989

[19] [19]

special points

Note that the analogous result in Acharya et al. (2017) forA 1 distance only has additive error, and this is why ourH1 oracle also incurs multiplicative error. Additionally, since the runtime of this subroutine isO(s)it is not a dominating term in the final runtime. Theorem 26Fix an intervalJ= [a, b]⊆[−1,1]. Suppose there aressamples in this rangea≤ x1 ≤ ...

2017

[20] [20]

We will chooseFas the class of3k-piecewise degree≲log(n)polynomial densities. By Corollary 12: H2(f, ˆfn)≲inf fθ∈F t d log(n)·H 2(f, fθ) log(2k/H2(f, fθ)) + klog 2(n)·(log(n) + log(2/δ)) n We use existing results that imply any Gaussian density is approximated withinεsquared Hellinger distance by a 3-piecewise degree≲log(1/ε)polynomial (Lemma 36 of Chan e...

2014