pith. sign in

arxiv: 2505.06088 · v3 · submitted 2025-05-09 · 🧮 math.PR · math.ST· stat.TH

Approximations for the number of maxima and near-maxima in independent data

Pith reviewed 2026-05-22 15:50 UTC · model grok-4.3

classification 🧮 math.PR math.STstat.TH
keywords number of maximanear-maximatotal variation distanceStein's methodlogarithmic distributionnegative binomial distributionPoisson approximationorder statistics
0
0 comments X

The pith

The number of maxima and near-maxima in iid samples can be approximated using logarithmic, Poisson and negative binomial distributions with explicit total variation error bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives explicit bounds on the error, measured in total variation distance, when approximating the count of how many observations match the sample maximum or fall near an order statistic. This is done separately for discrete and absolutely continuous random variables. For discrete variables the logarithmic and Poisson distributions are the targets of approximation, with the logarithmic case requiring new development of Stein's method. For continuous variables negative binomial approximations are obtained by viewing the count as a mixed binomial. These results matter for anyone needing to understand or simulate the behavior of extremes in moderate to large samples without computing the full distribution.

Core claim

We derive explicit error bounds in total variation distance when approximating the number of observations equal to the maximum of the sample (in the case where X is discrete) or the number of observations within a given distance of an order statistic of the sample (in the case where X is absolutely continuous). The logarithmic and Poisson distributions are used as approximations in the discrete case, with proofs which include the development of Stein's method for a logarithmic target distribution. In the absolutely continuous case our approximations are by the negative binomial distribution, and are established by considering negative binomial approximation for mixed binomials. The cases of

What carries the argument

Stein's method for a logarithmic target distribution combined with negative binomial approximation for mixed binomials to bound total variation distance to the count of maxima or near-maxima.

If this is right

  • Explicit error bounds hold for the geometric distribution.
  • Explicit error bounds hold for the Gumbel distribution.
  • Explicit error bounds hold for the uniform distribution.
  • The approximations yield concrete rates that can be used when exact computation of the count distribution is intractable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bounds may help determine when simulation of extreme counts becomes reliable for moderate sample sizes.
  • The Stein's method development for the logarithmic distribution could extend to counts based on other record statistics.
  • Numerical checks of the bounds for small to moderate n would indicate when the approximations are tight enough for applications.

Load-bearing premise

The n observations are independent and identically distributed.

What would settle it

For the geometric distribution with small n, compute the exact total variation distance between the distribution of the number of maxima and the logarithmic approximation and verify whether it stays below the paper's explicit bound.

Figures

Figures reproduced from arXiv: 2505.06088 by Fraser Daly.

Figure 1
Figure 1. Figure 1: The upper bound of Theorem 1(a) in the case where [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The upper bound (5) for the case where X has a Gumbel distribution, evaluated for various values of a in the cases n = 20 and n = 100 exposition concise we will here again assume that ℓ = 1; similar calculations may be carried out for other values of ℓ. With our choice of X we have that 1 − F(x − a) F(x) =  1 , x ∈ (0, a] , a x , x ∈ (a, 1), so that for j = 1, 2 we have Mj = n Z a 0 x n−1 dx + a j Z 1 a … view at source ↗
read the original abstract

In the setting where we have $n$ independent observations of a random variable $X$, we derive explicit error bounds in total variation distance when approximating the number of observations equal to the maximum of the sample (in the case where $X$ is discrete) or the number of observations within a given distance of an order statistic of the sample (in the case where $X$ is absolutely continuous). The logarithmic and Poisson distributions are used as approximations in the discrete case, with proofs which include the development of Stein's method for a logarithmic target distribution. In the absolutely continuous case our approximations are by the negative binomial distribution, and are established by considering negative binomial approximation for mixed binomials. The cases where $X$ is geometric, Gumbel and uniform are used as illustrative examples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript derives explicit total-variation error bounds for two approximation problems: (i) the number of observations equal to the sample maximum when X is discrete, approximated by logarithmic or Poisson distributions via a new Stein-method framework, and (ii) the number of observations lying within a fixed distance of an order statistic when X is absolutely continuous, approximated by the negative binomial distribution via negative-binomial bounds for mixed binomials. The claims are illustrated with geometric, Gumbel, and uniform examples.

Significance. If the error bounds are valid, the work supplies the first explicit, non-asymptotic total-variation guarantees for these counts, which appear in extreme-value and record statistics. The development of Stein’s method for the logarithmic distribution and the general mixed-binomial negative-binomial bounds are technically useful contributions that could be applied beyond the present setting.

major comments (1)
  1. [§4] §4 (absolutely continuous case): the representation of the near-maxima count as a mixed binomial conditions on the realized order statistic, yet the success indicators remain dependent on that conditioning variable. The general negative-binomial approximation bounds invoked in the proof do not appear to insert an explicit remainder term controlling this residual dependence; without it the claimed uniform TV bound may fail to hold. Please supply the precise statement of the mixed-binomial lemma and verify that the dependence is absorbed in the error term.
minor comments (2)
  1. [Abstract / Introduction] The abstract states that proofs include 'new Stein-method machinery' for the logarithmic distribution; a short paragraph in the introduction highlighting the key technical novelty (e.g., the choice of Stein operator or the coupling) would improve readability.
  2. [§3–4] Notation for the distance parameter in the continuous case (denoted variously as 'given distance' or 'ε') should be fixed once and used consistently in all statements of the main theorems.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We address the major comment below and have revised the manuscript to incorporate the requested clarifications.

read point-by-point responses
  1. Referee: §4 (absolutely continuous case): the representation of the near-maxima count as a mixed binomial conditions on the realized order statistic, yet the success indicators remain dependent on that conditioning variable. The general negative-binomial approximation bounds invoked in the proof do not appear to insert an explicit remainder term controlling this residual dependence; without it the claimed uniform TV bound may fail to hold. Please supply the precise statement of the mixed-binomial lemma and verify that the dependence is absorbed in the error term.

    Authors: We appreciate the referee's observation and agree that additional clarity is needed on this point. In the revised manuscript we have inserted the precise statement of the mixed-binomial negative-binomial approximation result as a new Lemma 4.2, which is a direct specialization of the general bounds from the literature we cite. The proof now explicitly decomposes the total-variation distance as E[ d_TV( conditional law | order statistic ) ] + d_TV( law of mixing measure, target mixing measure ). The first term is controlled by the standard mixed-binomial error (which already incorporates any dependence among the indicators that survives conditioning), while the second term bounds the variability induced by the dependence on the realized order statistic. Because both terms are bounded uniformly in the underlying distribution, the claimed uniform TV bound continues to hold. We have added this decomposition and the full statement of Lemma 4.2 to Section 4. revision: yes

Circularity Check

0 steps flagged

Derivations use Stein's method on standard identities without self-referential reduction

full rationale

The paper establishes explicit total variation bounds by applying Stein's method to the logarithmic distribution (discrete maxima case) and negative binomial approximation to mixed binomials (continuous near-maxima case). These steps rest on the i.i.d. assumption to express counts as sums of indicators, followed by development of Stein equations and standard approximation results for the target distributions. No equation or bound is shown to equal a fitted parameter or quantity defined from the same sample; the error bounds are derived from distributional properties rather than by construction from the target count itself. Self-citations, if present, are not load-bearing for the central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard axioms of probability for i.i.d. sampling and on the existence of Stein operators for the logarithmic and negative-binomial families; no new free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption The n observations are independent and identically distributed.
    Invoked in the opening sentence to define the setting and to justify the indicator-sum representation of the maxima count.
  • standard math Stein's method applies to the logarithmic and negative-binomial target distributions with explicit error bounds.
    The proofs are said to include the development of Stein's method for the logarithmic case.

pith-pipeline@v0.9.0 · 5655 in / 1419 out tokens · 51008 ms · 2026-05-22T15:50:13.585629+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We derive explicit error bounds in total variation distance when approximating the number of observations equal to the maximum of the sample (discrete case) or within a given distance of an order statistic (absolutely continuous case).

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    Arratia, L

    R. Arratia, L. Goldstein and F. Kochman (2019). Size bias for one and all. Probab. Surv. 16: 1–61

  2. [2]

    A. D. Barbour, L. Holst and S. Janson (1992). Poisson Approximation . Oxford University Press, Oxford

  3. [3]

    J. J. A. M. Brands, F. W. Steutel and R. J. G. Wilms (1994). On th e number of maxima in a discrete sample. Statist. Probab. Lett. 20(3): 209–217

  4. [4]

    T. C. Brown and M. J. Phillips (1999). Negative binomial approximat ion with Stein’s method. Methodol. Comput. Appl. Probab. 1(4): 407–421

  5. [5]

    F. T. Bruss and R. Gr¨ ubel (2003). On the multiplicity of the maxim um in a discrete random sample. Ann. Appl. Probab. 13(4): 1252–1263

  6. [6]

    Daly (2011)

    F. Daly (2011). On Stein’s method, smoothing estimates in total v ariation distance and mixture distributions. J. Statist. Plann. Inference 141(7): 2228–2237

  7. [7]

    Eisenberg (2009)

    B. Eisenberg (2009). The number of players tied for the record . Statist. Probab. Lett. 79(3): 283–288

  8. [8]

    Kirschenhofer and H

    P. Kirschenhofer and H. Prodinger (1996). The number of winne rs in a discrete geometrically distributed sample. Ann. Appl. Probab. 6(2): 687–694

  9. [9]

    Olofsson (1999)

    P. Olofsson (1999). A Poisson approximation with applications to t he number of maxima in a discrete sample. Statist. Probab. Lett. 44(1): 23–27

  10. [10]

    A. G. Pakes and Y. Li (1998). Limit laws for the number of near m axima via the Poisson approximation. Statist. Probab. Lett. 40(4): 395–401

  11. [11]

    A. G. Pakes and F. W. Steutel (1997). On the number of recor ds near the maximum. Austral. J. Statist. 32(2): 179–192

  12. [12]

    R¨ ade (1991)

    L. R¨ ade (1991). Problem E3436. Amer. Math. Monthly 98(4): 366

  13. [13]

    Ross (2011)

    N. Ross (2011). Fundamentals of Stein’s method. Probab. Surv. 8: 210–293

  14. [14]

    Ross (2013)

    N. Ross (2013). Power laws in preferential attachment graph s and Stein’s method for the negative binomial distribution. Adv. in Appl. Probab. 45(3): 876–893

  15. [15]

    Shaked and J

    M. Shaked and J. G. Shanthikumar (2007). Stochastic Orders. Springer, New York. 19