Approximations for the number of maxima and near-maxima in independent data
Pith reviewed 2026-05-22 15:50 UTC · model grok-4.3
The pith
The number of maxima and near-maxima in iid samples can be approximated using logarithmic, Poisson and negative binomial distributions with explicit total variation error bounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We derive explicit error bounds in total variation distance when approximating the number of observations equal to the maximum of the sample (in the case where X is discrete) or the number of observations within a given distance of an order statistic of the sample (in the case where X is absolutely continuous). The logarithmic and Poisson distributions are used as approximations in the discrete case, with proofs which include the development of Stein's method for a logarithmic target distribution. In the absolutely continuous case our approximations are by the negative binomial distribution, and are established by considering negative binomial approximation for mixed binomials. The cases of
What carries the argument
Stein's method for a logarithmic target distribution combined with negative binomial approximation for mixed binomials to bound total variation distance to the count of maxima or near-maxima.
If this is right
- Explicit error bounds hold for the geometric distribution.
- Explicit error bounds hold for the Gumbel distribution.
- Explicit error bounds hold for the uniform distribution.
- The approximations yield concrete rates that can be used when exact computation of the count distribution is intractable.
Where Pith is reading between the lines
- The bounds may help determine when simulation of extreme counts becomes reliable for moderate sample sizes.
- The Stein's method development for the logarithmic distribution could extend to counts based on other record statistics.
- Numerical checks of the bounds for small to moderate n would indicate when the approximations are tight enough for applications.
Load-bearing premise
The n observations are independent and identically distributed.
What would settle it
For the geometric distribution with small n, compute the exact total variation distance between the distribution of the number of maxima and the logarithmic approximation and verify whether it stays below the paper's explicit bound.
Figures
read the original abstract
In the setting where we have $n$ independent observations of a random variable $X$, we derive explicit error bounds in total variation distance when approximating the number of observations equal to the maximum of the sample (in the case where $X$ is discrete) or the number of observations within a given distance of an order statistic of the sample (in the case where $X$ is absolutely continuous). The logarithmic and Poisson distributions are used as approximations in the discrete case, with proofs which include the development of Stein's method for a logarithmic target distribution. In the absolutely continuous case our approximations are by the negative binomial distribution, and are established by considering negative binomial approximation for mixed binomials. The cases where $X$ is geometric, Gumbel and uniform are used as illustrative examples.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives explicit total-variation error bounds for two approximation problems: (i) the number of observations equal to the sample maximum when X is discrete, approximated by logarithmic or Poisson distributions via a new Stein-method framework, and (ii) the number of observations lying within a fixed distance of an order statistic when X is absolutely continuous, approximated by the negative binomial distribution via negative-binomial bounds for mixed binomials. The claims are illustrated with geometric, Gumbel, and uniform examples.
Significance. If the error bounds are valid, the work supplies the first explicit, non-asymptotic total-variation guarantees for these counts, which appear in extreme-value and record statistics. The development of Stein’s method for the logarithmic distribution and the general mixed-binomial negative-binomial bounds are technically useful contributions that could be applied beyond the present setting.
major comments (1)
- [§4] §4 (absolutely continuous case): the representation of the near-maxima count as a mixed binomial conditions on the realized order statistic, yet the success indicators remain dependent on that conditioning variable. The general negative-binomial approximation bounds invoked in the proof do not appear to insert an explicit remainder term controlling this residual dependence; without it the claimed uniform TV bound may fail to hold. Please supply the precise statement of the mixed-binomial lemma and verify that the dependence is absorbed in the error term.
minor comments (2)
- [Abstract / Introduction] The abstract states that proofs include 'new Stein-method machinery' for the logarithmic distribution; a short paragraph in the introduction highlighting the key technical novelty (e.g., the choice of Stein operator or the coupling) would improve readability.
- [§3–4] Notation for the distance parameter in the continuous case (denoted variously as 'given distance' or 'ε') should be fixed once and used consistently in all statements of the main theorems.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on our manuscript. We address the major comment below and have revised the manuscript to incorporate the requested clarifications.
read point-by-point responses
-
Referee: §4 (absolutely continuous case): the representation of the near-maxima count as a mixed binomial conditions on the realized order statistic, yet the success indicators remain dependent on that conditioning variable. The general negative-binomial approximation bounds invoked in the proof do not appear to insert an explicit remainder term controlling this residual dependence; without it the claimed uniform TV bound may fail to hold. Please supply the precise statement of the mixed-binomial lemma and verify that the dependence is absorbed in the error term.
Authors: We appreciate the referee's observation and agree that additional clarity is needed on this point. In the revised manuscript we have inserted the precise statement of the mixed-binomial negative-binomial approximation result as a new Lemma 4.2, which is a direct specialization of the general bounds from the literature we cite. The proof now explicitly decomposes the total-variation distance as E[ d_TV( conditional law | order statistic ) ] + d_TV( law of mixing measure, target mixing measure ). The first term is controlled by the standard mixed-binomial error (which already incorporates any dependence among the indicators that survives conditioning), while the second term bounds the variability induced by the dependence on the realized order statistic. Because both terms are bounded uniformly in the underlying distribution, the claimed uniform TV bound continues to hold. We have added this decomposition and the full statement of Lemma 4.2 to Section 4. revision: yes
Circularity Check
Derivations use Stein's method on standard identities without self-referential reduction
full rationale
The paper establishes explicit total variation bounds by applying Stein's method to the logarithmic distribution (discrete maxima case) and negative binomial approximation to mixed binomials (continuous near-maxima case). These steps rest on the i.i.d. assumption to express counts as sums of indicators, followed by development of Stein equations and standard approximation results for the target distributions. No equation or bound is shown to equal a fitted parameter or quantity defined from the same sample; the error bounds are derived from distributional properties rather than by construction from the target count itself. Self-citations, if present, are not load-bearing for the central claims.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The n observations are independent and identically distributed.
- standard math Stein's method applies to the logarithmic and negative-binomial target distributions with explicit error bounds.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derive explicit error bounds in total variation distance when approximating the number of observations equal to the maximum of the sample (discrete case) or within a given distance of an order statistic (absolutely continuous case).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
R. Arratia, L. Goldstein and F. Kochman (2019). Size bias for one and all. Probab. Surv. 16: 1–61
work page 2019
-
[2]
A. D. Barbour, L. Holst and S. Janson (1992). Poisson Approximation . Oxford University Press, Oxford
work page 1992
-
[3]
J. J. A. M. Brands, F. W. Steutel and R. J. G. Wilms (1994). On th e number of maxima in a discrete sample. Statist. Probab. Lett. 20(3): 209–217
work page 1994
-
[4]
T. C. Brown and M. J. Phillips (1999). Negative binomial approximat ion with Stein’s method. Methodol. Comput. Appl. Probab. 1(4): 407–421
work page 1999
-
[5]
F. T. Bruss and R. Gr¨ ubel (2003). On the multiplicity of the maxim um in a discrete random sample. Ann. Appl. Probab. 13(4): 1252–1263
work page 2003
-
[6]
F. Daly (2011). On Stein’s method, smoothing estimates in total v ariation distance and mixture distributions. J. Statist. Plann. Inference 141(7): 2228–2237
work page 2011
-
[7]
B. Eisenberg (2009). The number of players tied for the record . Statist. Probab. Lett. 79(3): 283–288
work page 2009
-
[8]
P. Kirschenhofer and H. Prodinger (1996). The number of winne rs in a discrete geometrically distributed sample. Ann. Appl. Probab. 6(2): 687–694
work page 1996
-
[9]
P. Olofsson (1999). A Poisson approximation with applications to t he number of maxima in a discrete sample. Statist. Probab. Lett. 44(1): 23–27
work page 1999
-
[10]
A. G. Pakes and Y. Li (1998). Limit laws for the number of near m axima via the Poisson approximation. Statist. Probab. Lett. 40(4): 395–401
work page 1998
-
[11]
A. G. Pakes and F. W. Steutel (1997). On the number of recor ds near the maximum. Austral. J. Statist. 32(2): 179–192
work page 1997
- [12]
- [13]
-
[14]
N. Ross (2013). Power laws in preferential attachment graph s and Stein’s method for the negative binomial distribution. Adv. in Appl. Probab. 45(3): 876–893
work page 2013
-
[15]
M. Shaked and J. G. Shanthikumar (2007). Stochastic Orders. Springer, New York. 19
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.