pith. sign in

arxiv: 2601.11727 · v2 · submitted 2026-01-16 · 💻 cs.IT · math.IT

Asymptotically Optimal Tests for One- and Two-Sample Problems

Pith reviewed 2026-05-16 13:03 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords hypothesis testingasymptotic optimalityrelative entropyHoeffding testone-sampletwo-samplestrong converseKL divergence
0
0 comments X

The pith

Threshold tests of relative entropy between empirical distributions are asymptotically optimal for one- and two-sample hypothesis testing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Hoeffding's test for the one-sample problem, which thresholds the relative entropy between the empirical distribution and a known nominal distribution, is asymptotically optimal. It extends the same idea to the two-sample problem, proving that thresholding the relative entropy between the two empirical distributions achieves asymptotic optimality there as well. A strong converse result is also derived for the two-sample case. These results matter because they identify simple tests that attain the best possible rates for error probabilities in the large-sample limit, providing both theoretical guarantees and practical intuition for binary hypothesis testing when distributions are unknown.

Core claim

In the one-sample setting, a threshold test on the relative entropy between the empirical distribution and the nominal distribution is asymptotically optimal. In the two-sample setting, a similar threshold test on the relative entropy between the two empirical distributions is asymptotically optimal, and a strong converse holds for this test.

What carries the argument

The relative entropy, or Kullback-Leibler divergence, between empirical distributions, serving as the test statistic in a simple threshold test that achieves the optimal error exponent.

If this is right

  • The one-sample Hoeffding test achieves the optimal asymptotic error rate.
  • The two-sample relative entropy test matches the best possible exponent for distinguishing two unknown distributions.
  • A strong converse implies that no other test can do better in the asymptotic regime.
  • These tests provide an intuitive and computationally straightforward alternative to other optimal procedures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the relative entropy test is optimal asymptotically, it may serve as a benchmark for designing finite-sample tests with good performance.
  • Similar divergence-based thresholds could extend to multi-sample or sequential hypothesis testing problems.
  • Connections might exist to modern applications in machine learning where empirical distributions are compared via divergences.

Load-bearing premise

The observations are independent and identically distributed draws from the true distributions, with analysis focused on the limit as the number of samples goes to infinity.

What would settle it

Finding a pair of distributions where the error probability of the relative entropy threshold test decays at a strictly slower rate than the known optimal exponent given by the Chernoff information.

Figures

Figures reproduced from arXiv: 2601.11727 by Arick Grootveld, Biao Chen, Venkata Gandikota.

Figure 1
Figure 1. Figure 1: One-Sample Problem Taking cn = O  log n n  ensures that α(THoff,n) ≤ ε for sufficiently large n, and − lim inf n→∞ 1 n log β(THoff,n) ≥ D (P∥Q). (15) Combined with Theorem 2, this establishes the fact that the test THoff,n is asymptotically optimal with type II error exponent D (P∥Q). 2.3 Two-Sample Problem The two-sample test involves samples Xn ∼ P, and Y n ∼ Q, with both P and Q unknown. The task is t… view at source ↗
Figure 2
Figure 2. Figure 2: Two-Sample Problem distributions, one can show that the distributions within the shrinking KLD ball “look similar” from the perspective of other distributions outside of the ball, i.e., for any Q ∈ Pd with Q ̸= P, when n is large enough, we get D (F∥Q) ≈ D (P∥Q), ∀F ∈ B(P, cn). (17) This, when combined with Sanov’s theorem (Theorem 1), shows the optimality of Hoeffding’s test. Theorem 3. For the test descr… view at source ↗
read the original abstract

In this work, we revisit the one- and two-sample testing problems: binary hypothesis testing in which one or both distributions are unknown. For the one-sample test, we provide a more streamlined proof of the asymptotic optimality of Hoeffding's likelihood ratio test, which is equivalent to the threshold test of the relative entropy between the empirical distribution and the nominal distribution. The new proof offers an intuitive interpretation and naturally extends to the two-sample test where we show that a similar form of Hoeffding's test, namely a threshold test of the relative entropy between the two empirical distributions is also asymptotically optimal. A strong converse for the two-sample test is also obtained.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript revisits one- and two-sample binary hypothesis testing problems with unknown distributions. For the one-sample case it supplies a streamlined proof that Hoeffding's likelihood-ratio test is asymptotically optimal and is equivalent to a threshold test on the relative entropy between the empirical distribution and the nominal distribution. The same approach is extended to the two-sample problem, where a threshold test on the relative entropy between the two empirical distributions is shown to be asymptotically optimal; a strong converse is also derived for the two-sample setting.

Significance. If the derivations hold, the work is significant because it supplies an intuitive large-deviation argument for the optimality of relative-entropy threshold tests, extends the classical one-sample result to the two-sample case, and adds a strong converse. These contributions clarify fundamental limits for hypothesis testing when distributions are unknown and reinforce the role of Sanov's theorem and related rate functions in establishing asymptotic optimality.

minor comments (3)
  1. [§2 and §3] §2 and §3: the statements of the main theorems should explicitly list the technical conditions on the alphabet size, the support of the distributions, and the rate at which the threshold is allowed to grow with n.
  2. The proof sketches rely on standard properties of the empirical measure; a short paragraph recalling the precise large-deviation upper and lower bounds used (e.g., the exact form of the rate function) would improve readability for readers outside the immediate subfield.
  3. Figure 1 (if present) or the numerical examples: the caption should state the exact sample sizes and the precise threshold values employed so that the plots can be reproduced from the stated optimality claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and recommendation for minor revision. The referee's summary accurately captures the manuscript's contributions on the asymptotic optimality of Hoeffding's relative-entropy threshold tests for one- and two-sample problems, including the strong converse for the two-sample case.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper supplies a streamlined proof of asymptotic optimality for Hoeffding's likelihood-ratio test (equivalently, a relative-entropy threshold test) in the one-sample setting and shows that an analogous relative-entropy threshold between two empirical distributions is asymptotically optimal for the two-sample problem, together with a strong converse. These claims rest on the standard large-deviation rate functions of empirical measures under i.i.d. sampling; the paper explicitly states that it does not re-derive those properties. No equation reduces a claimed prediction or optimality statement to a fitted parameter or to a self-citation whose content is itself the target result. The derivation chain therefore remains self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard large-deviation principles for empirical measures and the definition of relative entropy; no free parameters, ad-hoc axioms, or new invented entities are introduced in the abstract.

axioms (1)
  • standard math Large-deviation principle for empirical distributions (Sanov's theorem)
    Implicit foundation for any asymptotic optimality statement that uses relative entropy of empirical measures.

pith-pipeline@v0.9.0 · 5406 in / 1304 out tokens · 41368 ms · 2026-05-16T13:03:16.564821+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 2 internal anchors

  1. [1]

    Asymptotically optimal tests for multinomial distributions,

    W. Hoeffding, “Asymptotically optimal tests for multinomial distributions,”The Annals of Mathematical Statistics, pp. 369–401, 1965

  2. [2]

    On universal hypotheses testing via large deviations,

    O. Zeitouni and M. Gutman, “On universal hypotheses testing via large deviations,”IEEE Transactions on Information Theory, vol. 37, no. 2, pp. 285–290, 1991

  3. [3]

    Robust kullback-leibler divergence and universal hypothesis testing for continuous distributions,

    P. Yang and B. Chen, “Robust kullback-leibler divergence and universal hypothesis testing for continuous distributions,”IEEE Transactions on Information Theory, vol. 65, no. 4, pp. 2360–2373, 2018

  4. [4]

    f-divergence estimation and two-sample homo- geneity test under semiparametric density-ratio models,

    T. Kanamori, T. Suzuki, and M. Sugiyama, “f-divergence estimation and two-sample homo- geneity test under semiparametric density-ratio models,”IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 708–720, 2011

  5. [5]

    A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

    D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,”arXiv preprint arXiv:1610.02136, 2016

  6. [6]

    Integrating structured biological data by kernel maximum mean discrepancy,

    K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Schölkopf, and A. J. Smola, “Integrating structured biological data by kernel maximum mean discrepancy,”Bioinformatics, vol. 22, no. 14, pp. e49–e57, 2006. 9

  7. [7]

    Revisiting Classifier Two-Sample Tests

    D. Lopez-Paz and M. Oquab, “Revisiting classifier two-sample tests,”arXiv preprint arXiv:1610.06545, 2016

  8. [8]

    A kernel method for the two-sample-problem,

    A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola, “A kernel method for the two-sample-problem,”Advances in neural information processing systems, vol. 19, 2006

  9. [9]

    Universal hypothesis testing with kernels: Asymp- totically optimal tests for goodness of fit,

    S. Zhu, B. Chen, P. Yang, and Z. Chen, “Universal hypothesis testing with kernels: Asymp- totically optimal tests for goodness of fit,” inThe 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 1544–1553

  10. [10]

    Asymptotically optimal one-and two-sample testing with kernels,

    S. Zhu, B. Chen, Z. Chen, and P. Yang, “Asymptotically optimal one-and two-sample testing with kernels,”IEEE Transactions on Information Theory, vol. 67, no. 4, pp. 2074–2092, 2021

  11. [11]

    T. M. Cover and J. A. Thomas,Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, July 2006

  12. [12]

    Continuity of quantum entropic quantities via almost convexity,

    A. Bluhm, Á. Capel, P. Gondolf, and A. Pérez-Hernández, “Continuity of quantum entropic quantities via almost convexity,”IEEE Transactions on Information Theory, vol. 69, no. 9, pp. 5869–5901, 2023

  13. [13]

    Rényi divergence and kullback-leibler divergence,

    T. Van Erven and P. Harremos, “Rényi divergence and kullback-leibler divergence,”IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797–3820, 2014

  14. [14]

    Boyd and L

    S. Boyd and L. Vandenberghe,Convex optimization. Cambridge university press, 2004

  15. [15]

    A. W. Van der Vaart,Asymptotic statistics. Cambridge university press, 2000, vol. 3. 10