On Robust Hypothesis Testing with respect to the Hellinger Distance

Ananda Theertha Suresh; Eeshan Modak; Sivaraman Balakrishnan

arxiv: 2510.16750 · v3 · submitted 2025-10-19 · 🧮 math.ST · cs.IT· math.IT· stat.ML· stat.TH

On Robust Hypothesis Testing with respect to the Hellinger Distance

Eeshan Modak , Sivaraman Balakrishnan , Ananda Theertha Suresh This is my paper

Pith reviewed 2026-05-18 06:48 UTC · model grok-4.3

classification 🧮 math.ST cs.ITmath.ITstat.MLstat.TH

keywords robust hypothesis testingHellinger distancemisspecificationslack factorcomposite hypothesis testingchi-squared distance

0 comments

The pith

Robust Hellinger hypothesis tests require the true distribution to be substantially closer to one hypothesis than the other.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies hypothesis testing in which samples may arise from a small perturbation of either of two fixed distributions rather than from one of them exactly. It asks for a test that remains reliable under such misspecification and correctly declares which of the two original distributions lies closer in Hellinger distance to the true law. The central result supplies a lower bound on the necessary gap between those two distances; without a sufficiently large gap the task is information-theoretically impossible. The same bound carries over to symmetric chi-squared distance, and the authors also construct an explicit test when each hypothesis is itself a Hellinger ball around a fixed center.

Core claim

The paper establishes a quantitative lower bound on the slack factor: the true distribution must be measurably closer in Hellinger distance to one hypothesis than to the other before any test can reliably identify the nearer hypothesis under small misspecification. When the distances are nearly equal the problem is intractable. The bound is also shown to govern testing with respect to symmetric chi-squared distance, and a concrete test is supplied and analyzed for the composite setting in which each hypothesis is a Hellinger ball.

What carries the argument

The slack factor, defined as the minimum excess closeness (in Hellinger distance) that the true distribution must exhibit toward one hypothesis over the other for any robust test to succeed.

If this is right

Any test that claims robustness must fail with high probability once the slack factor falls below the derived threshold.
The same quantitative gap is necessary when the underlying distance is symmetric chi-squared instead of Hellinger.
When each hypothesis is enlarged to a Hellinger ball, an explicit test exists whose error probability is controlled by the radius and the separation between centers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The slack-factor bound may serve as a design criterion for choosing nominal distributions that admit robust tests in practice.
Similar lower bounds could be derived for other f-divergences that satisfy the same triangle-type inequalities used here.
In high-dimensional or nonparametric regimes the same intractability threshold would force practitioners to enlarge the separation between hypotheses before data collection begins.

Load-bearing premise

The observed samples are drawn from a distribution that is a sufficiently small perturbation, in Hellinger distance, of exactly one of the two nominal hypotheses.

What would settle it

A concrete counter-example distribution lying within the stated Hellinger perturbation radius yet equidistant (or closer than the derived slack) to both hypotheses, together with a test that still succeeds with high probability, would refute the lower bound.

Figures

Figures reproduced from arXiv: 2510.16750 by Ananda Theertha Suresh, Eeshan Modak, Sivaraman Balakrishnan.

**Figure 1.** Figure 1: D1 is a family of distributions such that all its members are √ √ 2 2−1 times farther to p2 than to p1 in Hellinger distance. Likewise, all the members of D2 are √ √ 2 2−1 times farther to p1 than to p2 in Hellinger distance. 3) We use Le Cam’s argument to show that the probability of error in distinguishing between Dm 1 and Dm 2 is at least 1 3 . 4) The above three points imply that if we had a γ-robust t… view at source ↗

**Figure 2.** Figure 2: An example of p R1,R2 (perturbed around p1) and p¯ R1,R2 (perturbed around p2) when R1 = {2, 4} and R2 = {1, 3}. Consider the distribution p R1,R2 obtained by perturbing p1. p R1,R2 (x) =    1 − b x ∈ Ij , j /∈ R1, j ≤ Nm 1 − b + a1 x ∈ Ij , j ∈ R1, j ≤ Nm 1 + b x ∈ Ij , j − Nm ∈/ R2, j > Nm 1 + b − a2 x ∈ Ij , j − Nm ∈ R2, j > Nm. Note that this is indeed a distribution since the pr… view at source ↗

read the original abstract

We study a variant of the simple hypothesis testing problem where observed samples do not necessarily come from either of the specified distributions, but rather from a close variant of them. In this setting, we require a test that is robust to misspecification and identifies which distribution is closer in Hellinger distance. If the underlying distribution is nearly equidistant from both hypotheses, the problem becomes intractable. Our main result is a lower bound on the slack factor, which quantifies how much closer the underlying distribution must be to one hypothesis relative to the other for any test to remain robust. We also demonstrate the implications of this result for testing with respect to symmetric chi-squared distance. Finally, we study an alternative way to specify robustness, where each hypothesis is a Hellinger ball around a fixed distribution. We provide and analyze a test for this composite hypothesis testing problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The manuscript studies robust binary hypothesis testing under Hellinger-distance misspecification: samples are drawn from a distribution that lies within a small Hellinger ball of one of two nominal distributions P0 or P1, and the goal is to identify which nominal distribution is closer to the true law. The central claim is a lower bound on the slack factor (the minimal ratio of Hellinger distances required for any test to remain robust). When the true distribution is nearly equidistant from both hypotheses the problem is intractable. The paper also derives consequences for testing with respect to symmetric chi-squared distance and supplies an explicit test together with risk analysis for the composite formulation in which each hypothesis is itself a Hellinger ball.

Significance. If the lower bound holds, the work supplies a precise quantitative limit on the robustness margin available under Hellinger contamination, which is directly relevant to misspecified or contaminated data settings. The explicit test and matching risk bounds for the composite-ball model, together with the reduction to the equidistant case for intractability, give the result both theoretical and constructive value. The derivations rest on standard properties of the Hellinger distance and contain no hidden uniformity assumptions or free parameters.

minor comments (4)

[Abstract] Abstract, paragraph on intractability: the precise mathematical condition under which the problem becomes intractable (near-equidistance) should be stated explicitly rather than described qualitatively.
[Implications for chi-squared] Section on implications for chi-squared distance: the translation from the Hellinger slack-factor bound to the symmetric chi-squared setting should include the explicit constant factors that arise from the relationship between the two distances.
[Composite hypothesis testing] Composite hypothesis section: the risk analysis for the proposed test is clear, but a short comparison of the achieved slack factor with the lower bound derived for the simple-hypothesis case would help the reader assess optimality.
[Main theorem] Notation: ensure that the definition of the slack factor is repeated or cross-referenced at the first use in the main theorem statement.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and for the positive assessment, including the recommendation for minor revision. The referee's summary correctly identifies the main contributions: the lower bound on the slack factor for robust identification under Hellinger misspecification, the intractability result when the true distribution is nearly equidistant, the implications for symmetric chi-squared distance, and the explicit test with risk bounds for the composite Hellinger-ball formulation. We appreciate the recognition of the theoretical and constructive value of these results.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from standard Hellinger properties

full rationale

The paper derives its main lower bound on the slack factor directly from the definitions of the Hellinger-ball contamination model, the slack factor itself, and standard properties of the Hellinger distance, without any reduction to fitted inputs, self-referential definitions, or load-bearing self-citations. The intractability result for the equidistant case is presented as a separate modeling choice rather than a derived prediction, and the composite hypothesis testing section provides an explicit test construction with matching risk bounds. All steps remain independent of the target result and rely on externally verifiable distance properties, making the central claim self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard properties of probability measures and the Hellinger metric; no free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

standard math Hellinger distance is a valid metric on probability distributions and satisfies standard properties used in hypothesis testing.
Invoked throughout the definition of closeness and robustness.

pith-pipeline@v0.9.0 · 5690 in / 1114 out tokens · 28537 ms · 2026-05-18T06:48:49.762284+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our main result is a lower bound on the slack factor... γ* ≥ √2/(√2−1)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

H²(p1,p2)=½∥√p1−√p2∥²₂ ... tensorization property

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

[1]

A robust version of the probability ratio test,

P. J. Huber, “A robust version of the probability ratio test,”The Annals of Mathematical Statistics, pp. 1753–1758, 1965

work page 1965
[2]

Robust hypothesis testing with a relative entropy tolerance,

B. C. Levy, “Robust hypothesis testing with a relative entropy tolerance,”IEEE Transactions on Information Theory, vol. 55, no. 1, pp. 413–421, 2008

work page 2008
[3]

Minimax robust hypothesis testing,

G. Gül and A. M. Zoubir, “Minimax robust hypothesis testing,”IEEE Transactions on Information Theory, vol. 63, no. 9, pp. 5572–5587, 2017

work page 2017
[4]

Hypothesis testing for arbitrarily varying source,

F. Fangwei and S. Shiyi, “Hypothesis testing for arbitrarily varying source,”Acta Mathematica Sinica, vol. 12, no. 1, pp. 33–39, 1996

work page 1996
[5]

Adversarial hypothesis testing and a quantum Stein’s lemma for restricted measurements,

F. G. Brandão, A. W. Harrow, J. R. Lee, and Y . Peres, “Adversarial hypothesis testing and a quantum Stein’s lemma for restricted measurements,”IEEE Transactions on Information Theory, vol. 66, no. 8, pp. 5037–5054, 2020

work page 2020
[6]

Devroye and G

L. Devroye and G. Lugosi,Combinatorial methods in density estimation. Springer Science & Business Media, 2001

work page 2001
[7]

The optimal approximation factor in density estimation,

O. Bousquet, D. Kane, and S. Moran, “The optimal approximation factor in density estimation,” inConference on Learning Theory, pp. 318–341, PMLR, 2019

work page 2019
[8]

Robust hypothesis testing and distribution estimation in hellinger distance,

A. T. Suresh, “Robust hypothesis testing and distribution estimation in hellinger distance,” inInternational Conference on Artificial Intelligence and Statistics, pp. 2962–2970, PMLR, 2021

work page 2021
[9]

Estimator selection with respect to hellinger-type risks,

Y . Baraud, “Estimator selection with respect to hellinger-type risks,”Probability theory and related fields, vol. 151, pp. 353–401, 2011

work page 2011
[10]

Assouad, Fano, and Le Cam,

B. Yu, “Assouad, Fano, and Le Cam,” inFestschrift for Lucien Le Cam: research papers in probability and statistics, pp. 423–435, Springer, 1997

work page 1997
[11]

Giné and R

E. Giné and R. Nickl,Mathematical foundations of infinite-dimensional statistical models. Cambridge university press, 2021

work page 2021
[12]

Density estimation in linear time

S. Mahalanabis and D. Stefankovic, “Density estimation in linear time,”arXiv preprint arXiv:0712.2869, 2007

work page internal anchor Pith review Pith/arXiv arXiv 2007

[1] [1]

A robust version of the probability ratio test,

P. J. Huber, “A robust version of the probability ratio test,”The Annals of Mathematical Statistics, pp. 1753–1758, 1965

work page 1965

[2] [2]

Robust hypothesis testing with a relative entropy tolerance,

B. C. Levy, “Robust hypothesis testing with a relative entropy tolerance,”IEEE Transactions on Information Theory, vol. 55, no. 1, pp. 413–421, 2008

work page 2008

[3] [3]

Minimax robust hypothesis testing,

G. Gül and A. M. Zoubir, “Minimax robust hypothesis testing,”IEEE Transactions on Information Theory, vol. 63, no. 9, pp. 5572–5587, 2017

work page 2017

[4] [4]

Hypothesis testing for arbitrarily varying source,

F. Fangwei and S. Shiyi, “Hypothesis testing for arbitrarily varying source,”Acta Mathematica Sinica, vol. 12, no. 1, pp. 33–39, 1996

work page 1996

[5] [5]

Adversarial hypothesis testing and a quantum Stein’s lemma for restricted measurements,

F. G. Brandão, A. W. Harrow, J. R. Lee, and Y . Peres, “Adversarial hypothesis testing and a quantum Stein’s lemma for restricted measurements,”IEEE Transactions on Information Theory, vol. 66, no. 8, pp. 5037–5054, 2020

work page 2020

[6] [6]

Devroye and G

L. Devroye and G. Lugosi,Combinatorial methods in density estimation. Springer Science & Business Media, 2001

work page 2001

[7] [7]

The optimal approximation factor in density estimation,

O. Bousquet, D. Kane, and S. Moran, “The optimal approximation factor in density estimation,” inConference on Learning Theory, pp. 318–341, PMLR, 2019

work page 2019

[8] [8]

Robust hypothesis testing and distribution estimation in hellinger distance,

A. T. Suresh, “Robust hypothesis testing and distribution estimation in hellinger distance,” inInternational Conference on Artificial Intelligence and Statistics, pp. 2962–2970, PMLR, 2021

work page 2021

[9] [9]

Estimator selection with respect to hellinger-type risks,

Y . Baraud, “Estimator selection with respect to hellinger-type risks,”Probability theory and related fields, vol. 151, pp. 353–401, 2011

work page 2011

[10] [10]

Assouad, Fano, and Le Cam,

B. Yu, “Assouad, Fano, and Le Cam,” inFestschrift for Lucien Le Cam: research papers in probability and statistics, pp. 423–435, Springer, 1997

work page 1997

[11] [11]

Giné and R

E. Giné and R. Nickl,Mathematical foundations of infinite-dimensional statistical models. Cambridge university press, 2021

work page 2021

[12] [12]

Density estimation in linear time

S. Mahalanabis and D. Stefankovic, “Density estimation in linear time,”arXiv preprint arXiv:0712.2869, 2007

work page internal anchor Pith review Pith/arXiv arXiv 2007