Testing Support Size More Efficiently Than Learning Histograms
Pith reviewed 2026-05-23 19:16 UTC · model grok-4.3
The pith
Support size testing can be done with O(n/(ε log n) log(1/ε)) samples by analyzing Chebyshev approximations outside their standard range.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Testing whether p is supported on at most n elements or ε-far from any such distribution in total variation distance can be performed with O(n/(ε log n) log(1/ε)) samples from p; the algorithm also yields better lower bounds on support size than prior methods, and the proof proceeds by analyzing Chebyshev polynomial approximations outside the range where they were originally designed to approximate well.
What carries the argument
Chebyshev polynomial approximations analyzed outside their designed range to bound the tester's sample complexity.
If this is right
- Support size testing requires only O(n/(ε log n) log(1/ε)) samples rather than the Θ(n/(ε² log n)) needed for histogram learning.
- The achieved upper bound nearly matches the known Ω(n/(ε log n)) lower bound.
- The same method produces larger lower bounds on support size than what follows from previous histogram-based approaches.
- The Chebyshev polynomial technique supplies a self-contained proof for the improved tester.
Where Pith is reading between the lines
- The efficiency gain may indicate that testing other simple distribution properties can also avoid the full cost of learning.
- Extensions of the Chebyshev analysis could tighten bounds for related testing tasks such as uniformity or monotonicity.
- The separation between testing and learning sample complexities might be quantified more sharply for support-size-like properties.
Load-bearing premise
The analysis of Chebyshev polynomial approximations outside the range where they are designed to be good approximations is valid and directly yields the improved sample complexity for support size testing.
What would settle it
A concrete distribution for which any tester using o(n/(ε log n) log(1/ε)) samples fails to distinguish support size at most n from being ε-far would falsify the claimed upper bound.
read the original abstract
Consider two problems about an unknown probability distribution $p$: 1. How many samples from $p$ are required to test if $p$ is supported on $n$ elements or not? Specifically, given samples from $p$, determine whether it is supported on at most $n$ elements, or it is "$\epsilon$-far" (in total variation distance) from being supported on $n$ elements. 2. Given $m$ samples from $p$, what is the largest lower bound on its support size that we can produce? The best known upper bound for problem (1) uses a general algorithm for learning the histogram of the distribution $p$, which requires $\Theta(\tfrac{n}{\epsilon^2 \log n})$ samples. We show that testing can be done more efficiently than learning the histogram, using only $O(\tfrac{n}{\epsilon \log n} \log(1/\epsilon))$ samples, nearly matching the best known lower bound of $\Omega(\tfrac{n}{\epsilon \log n})$. This algorithm also provides a better solution to problem (2), producing larger lower bounds on support size than what follows from previous work. The proof relies on an analysis of Chebyshev polynomial approximations outside the range where they are designed to be good approximations, and the paper is intended as an accessible self-contained exposition of the Chebyshev polynomial method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that support-size testing (distinguishing whether an unknown distribution p has support size ≤n or is ε-far in TV distance from any distribution with support ≤n) can be performed with O(n/(ε log n) log(1/ε)) samples. This improves on the Θ(n/(ε² log n)) sample bound obtained by reducing to histogram learning and nearly matches the Ω(n/(ε log n)) lower bound. The same technique yields improved lower bounds on support size. The proof is presented as a self-contained analysis of Chebyshev polynomial approximations extended outside their standard approximation interval.
Significance. If the central analysis holds, the result is significant for distribution property testing: it separates testing from learning for a natural property and demonstrates that Chebyshev-based polynomial methods can yield near-optimal testers without requiring full histogram recovery. The self-contained exposition of the polynomial technique is a positive contribution to the literature.
major comments (1)
- [§4] §4 (Chebyshev analysis outside the design interval): The central sample-complexity improvement from Θ(1/ε²) to O(1/ε) rests on the claim that the error incurred by the Chebyshev approximation when evaluated outside its nominal range remains small enough not to re-introduce an extra 1/ε factor. The manuscript states that the analysis is self-contained, but the provided derivation does not explicitly bound the out-of-range contribution (e.g., via a concrete lemma relating the minimax error on an enlarged interval to the final tester threshold). Without this bound, it is unclear whether the stated O(n/(ε log n) log(1/ε)) bound is achieved or whether hidden factors cancel the improvement.
minor comments (1)
- [§1] The abstract and introduction use “n” both for the support-size parameter and (implicitly) for the universe size; a single clarifying sentence in §1 would remove ambiguity.
Simulated Author's Rebuttal
We thank the referee for the careful review and for highlighting the significance of separating testing from learning in this setting. We address the sole major comment below.
read point-by-point responses
-
Referee: [§4] §4 (Chebyshev analysis outside the design interval): The central sample-complexity improvement from Θ(1/ε²) to O(1/ε) rests on the claim that the error incurred by the Chebyshev approximation when evaluated outside its nominal range remains small enough not to re-introduce an extra 1/ε factor. The manuscript states that the analysis is self-contained, but the provided derivation does not explicitly bound the out-of-range contribution (e.g., via a concrete lemma relating the minimax error on an enlarged interval to the final tester threshold). Without this bound, it is unclear whether the stated O(n/(ε log n) log(1/ε)) bound is achieved or whether hidden factors cancel the improvement.
Authors: We agree that an explicit bound on the out-of-range error would make the argument clearer and easier to verify. In the revised version we will add a short lemma (new Lemma 4.3) that directly relates the minimax error of the degree-d Chebyshev approximant on the enlarged interval [0,1+δ] (with δ = Θ(ε log n / n) as used by the tester) to the acceptance threshold. The lemma uses the standard bound |T_k(x)| ≤ (x + sqrt(x²-1))^k for x>1 together with the concrete choice of d and the approximation interval in our construction to show that the extraneous error is at most O(ε / log(1/ε)). This is absorbed into the existing O(log(1/ε)) factor without restoring a 1/ε² dependence, confirming that the claimed O(n/(ε log n) log(1/ε)) sample bound holds. We thank the referee for prompting this clarification. revision: yes
Circularity Check
No circularity; derivation is self-contained polynomial analysis
full rationale
The paper derives an improved sample complexity for support size testing via analysis of Chebyshev polynomial approximations, presented as a self-contained exposition. No load-bearing steps reduce to self-citations, fitted inputs renamed as predictions, or self-definitional equivalences. The central upper bound O(n/(ε log n) log(1/ε)) is obtained from explicit approximation error bounds rather than by construction from the input histogram learner or prior author results. This matches the default expectation of no significant circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard properties of Chebyshev polynomials and total variation distance in distribution testing
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The proof relies on an analysis of Chebyshev polynomial approximations outside the range where they are designed to be good approximations... Pd(x) := −δ Td(ψ(x)) ... Q(x) := 1 + e^{-m x} P(x)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We show that testing can be done more efficiently than learning the histogram, using only O(n/(ε log n) log(1/ε)) samples
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A unified maximum likelihood approach for estimating symmetric properties of discrete distributions
Jayadev Acharya, Hirakendu Das, Alon Orlitsky, and Ananda Theertha Suresh. A unified maximum likelihood approach for estimating symmetric properties of discrete distributions. In , volume 70, pages 11--21, 2017
work page 2017
-
[2]
Refining the adaptivity notion in the huge object model
Tomer Adar and Eldar Fischer. Refining the adaptivity notion in the huge object model. In , 2024
work page 2024
-
[3]
Support testing in the huge object model
Tomer Adar, Eldar Fischer, and Amit Levi. Support testing in the huge object model. In , 2024
work page 2024
-
[4]
Distribution testing lower bounds via reductions from communication complexity
Eric Blais, Clément L Canonne, and Tom Gur. Distribution testing lower bounds via reductions from communication complexity. , 11(2):1--37, 2019
work page 2019
-
[5]
Learnability and the vapnik-chervonenkis dimension
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM) , 36(4):929--965, 1989
work page 1989
-
[6]
Estimating the number of species: a review
John Bunge and Michael Fitzpatrick. Estimating the number of species: a review. Journal of the American statistical Association , 88(421):364--373, 1993
work page 1993
-
[7]
VC dimension and distribution-free sample-based testing
Eric Blais, Renato Ferreira Pinto Jr , and Nathaniel Harms. VC dimension and distribution-free sample-based testing. In , 2021
work page 2021
-
[8]
A survey on distribution testing: Your data is big
Cl \'e ment Canonne. A survey on distribution testing: Your data is big. but is it blue? , 2020
work page 2020
-
[9]
Testing for families of distributions via the fourier transform
Cl \'e ment L Canonne, Ilias Diakonikolas, and Alistair Stewart. Testing for families of distributions via the fourier transform. In , 2018
work page 2018
-
[10]
Ronald A Fisher, A Steven Corbet, and Carrington B Williams. The relation between the number of species and the number of individuals in a random sample of an animal population. The Journal of Animal Ecology , pages 42--58, 1943
work page 1943
-
[11]
Distribution testing under the parity trace
Renato Ferreira Pinto Jr. and Nathaniel Harms. Distribution testing under the parity trace. arXiv preprint arXiv:2304.01374 , 2023
-
[12]
Property testing and its connection to learning and approximation
Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. , 45(4):653--750, 1998
work page 1998
-
[13]
Introduction to property testing
Oded Goldreich. Introduction to property testing . Cambridge University Press, 2017
work page 2017
-
[14]
On the complexity of estimating the effective support size
Oded Goldreich. On the complexity of estimating the effective support size. ECCC, TR19-088 , 2019
work page 2019
-
[15]
Testing Bipartitness in an Augmented VDF Bounded-Degree Graph Model
Oded Goldreich. Testing bipartitness in an augmented vdf bounded-degree graph model. arXiv preprint arXiv:1905.03070 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[16]
On the estimation of the number of classes in a population
Leo A Goodman. On the estimation of the number of classes in a population. The Annals of Mathematical Statistics , 20(4):572--579, 1949
work page 1949
-
[17]
I.J. Good. The population frequencies of species and the estimation of population parameters. Biometrika , pages 237--264, 1953
work page 1953
-
[18]
Oded Goldreich and Dana Ron. On sample-based testers. , 8(2):1--54, 2016
work page 2016
-
[19]
On the relation between the relative earth mover distance and the variation distance (an exposition)
Oded Goldreich and Dana Ron. On the relation between the relative earth mover distance and the variation distance (an exposition). In Oded Goldreich, editor, Computational Complexity and Property Testing: On the Interplay Between Randomness and Computation , pages 141--151. Springer Cham, 2020
work page 2020
-
[20]
Testing distributions of huge objects
Oded Goldreich and Dana Ron. Testing distributions of huge objects. TheoretiCS , 2, 2023
work page 2023
-
[21]
The optimal sample complexity of PAC learning
Steve Hanneke. The optimal sample complexity of PAC learning. , 17(38):1--15, 2016
work page 2016
-
[22]
Steve Hanneke. CSTheory stackexchange answer. https://cstheory.stackexchange.com/questions/40161/proper-pac-learning-vc-dimension-bounds, 2019. Accessed 2024-09-10
work page 2019
-
[23]
Yanjun Han, Jiantao Jiao, and Tsachy Weissman. Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance. In , pages 3189--3221. PMLR, 2018
work page 2018
-
[24]
The broad optimality of profile maximum likelihood
Yi Hao and Alon Orlitsky. The broad optimality of profile maximum likelihood. , 2019
work page 2019
-
[25]
Unified sample-optimal property estimation in near-linear time
Yi Hao and Alon Orlitsky. Unified sample-optimal property estimation in near-linear time. In , 2019
work page 2019
-
[26]
Online versus offline adversaries in property testing
Esty Kelman, Ephraim Linder, and Sofya Raskhodnikova. Online versus offline adversaries in property testing. In , pages 65:1--65:18, 2025
work page 2025
-
[27]
Estimating the effective support size in constant query complexity
Shyam Narayanan and Jakub T e tek. Estimating the effective support size in constant query complexity. In , pages 242--252, 2023
work page 2023
-
[28]
OEIS . OEIS sequence A008310 . https://oeis.org/A008310. Accessed 2024-08-13
work page 2024
-
[29]
Tolerant property testing and distance approximation
Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Tolerant property testing and distance approximation. Journal of Computer and System Sciences , 72(6):1012--1042, 2006
work page 2006
-
[30]
Almost optimal distribution-free sample-based testing of k-modality
Dana Ron and Asaf Rosin. Almost optimal distribution-free sample-based testing of k-modality. In , 2020
work page 2020
-
[31]
Optimal distribution-free sample-based testing of subsequence-freeness with one-sided error
Dana Ron and Asaf Rosin. Optimal distribution-free sample-based testing of subsequence-freeness with one-sided error. , 14(1):1--31, 2022
work page 2022
-
[32]
Strong lower bounds for approximating distribution support size and the distinct elements problem
Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bounds for approximating distribution support size and the distinct elements problem. , 39(3):813--842, 2009
work page 2009
-
[33]
Gregory Valiant and Paul Valiant. Estimating the unseen: an n/ (n) -sample estimator for entropy and support size, shown optimal via new CLTs . In , 2011
work page 2011
-
[34]
The power of linear estimators
Gregory Valiant and Paul Valiant. The power of linear estimators. In , 2011
work page 2011
-
[35]
Instance optimal learning of discrete distributions
Gregory Valiant and Paul Valiant. Instance optimal learning of discrete distributions. In , 2016
work page 2016
-
[36]
Estimating the unseen: improved estimators for entropy and other properties
Gregory Valiant and Paul Valiant. Estimating the unseen: improved estimators for entropy and other properties. , 2017
work page 2017
-
[37]
Chebyshev polynomials, moment matching, and optimal estimation of the unseen
Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics , 47(2):857--883, 2019
work page 2019
-
[38]
Polynomial methods in statistical inference: theory and practice
Yihong Wu and Pengkun Yang. Polynomial methods in statistical inference: theory and practice . Foundations and Trends in Communications and Information Theory. now Publishers, 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.